The Unreasonable Effectiveness of Data

Wednesday, March 25, 2009 at 3/25/2009 05:02:00 PM



Alon Halevy, Peter Norvig, and I argue that we should stop acting as if our goal is to author extremely elegant theories, and instead embrace complexity and make use of the best ally we have: the unreasonable effectiveness of data. See the full article here (IEEE Intelligent Systems, March/April 2009).

8 comments:

Nuclear Sugar said...

I can't believe how beautiful and exciting this PDF is. It will change the world again. Viva mutation!

*whats that strange sight ahead?*
...strong AI?

miramon said...

Does seem like a nice paper. Still have to read it in detail. IMO this posting is the kind of valuable thing I would hope to see more of in this blog.

Iúri said...

It's not unreasonable to look for elegant theories in every field, not at all. Plato's Problem hints us humans must be much simpler than we generally accept. Sure you can use statistics and huge corpora to solve problems 5-year-olds can solve, but don't tell me that's the best we can do. I'm all for pragmatism and quick hacks, but publishing an article preaching that? Not cool.

Frank said...

Why create a controversy when there isn't one? And why set up a straw man? And why in such a toxic tone?

See the blog replies by Stefano Mazzocchi and Frank van Harmelen (= me).

Nuclear Sugar said...

The Straw Man argument is relevant and valid. Thanks for reminding us of this idea; for it can be dangerous to tread into the future without analyzing both clear paths.
http://en.wikipedia.org/wiki/Straw_man


But might they be more interested in the effects of ubiquitous and empowered data upon culture? And then the recursion of that effect as it snowballs? It seems they understand/remember the reflex of mass culture since their tools can affect the zeitgeist.

Wow. I've never read about Plato's Problem from Chomsky's POV. But don't they want to throw out elegant theories for elegant(dasein) AI? Therefore brute-forcing the gap between knowledge and experience through crowdsourcing by the means of self-enhancing weak AI.
http://en.wikipedia.org/wiki/Plato%27s_Problem


Also, don't they want to further organize the data-building-upon-data and create beauty from the webs mutation? I do admit the article has an aggressive stance.

Dr. F. Alias said...

Is this the same Fernando Perreira who did C-Prolog?

dullhunk said...

I enjoyed reading this opinion piece, there is plenty to agree and disagree with here.

Perhaps its more about the unreasonable effectiveness of google, rather than just the unreasonable effectiveness of data?

Joe Colannino said...

I tend to agree. Statistically improbable phrases give, on average, ten times less information overload and more than double the confidence of retrieving relevant data as compared to Boolean keyword searching. If the Semantic Web is to become a reality, it will require automated algorithms such as TF x IDF operating on oodles of data. I hope to publish some derivative articles sometime soon. In the meantime, please see

http://docs.google.com/fileview?id=F.185065c2-95e9-4cdc-8840-3259521f6f51&hl=en

and

http://docs.google.com/fileview?id=F.a00ab83a-b88a-4201-93af-be34c456935b&hl=en