Predicting the Present with Google Trends

Thursday, April 02, 2009 at 4/02/2009 02:10:00 PM



Can Google queries help predict economic activity?

The answer depends on what you mean by "predict." Google Trends and Google Insights for Search provide a real time report on query volume, while economic data is typically released several days after the close of the month. Given this time lag, it is not implausible that Google queries in a category like "Automotive/Vehicle Shopping" during the first few weeks of March may help predict what actual March automotive sales will be like when the official data is released halfway through April.

That famous economist Yogi Berra once said "It's tough to make predictions, especially about the future." This inspired our approach: let us lower the bar and just try to predict the present.

Our work to date is summarized in a paper called Predicting the Present with Google Trends. We find that Google Trends data can help improve forecasts of the current level of activity for a number of different economic time series, including automobile sales, home sales, retail sales, and travel behavior.

Even predicting the present is useful, since it may help identify "turning points" in economic time series. If people start doing significantly more searches for "Real Estate Agents" in a certain location, it is tempting to think that house sales might increase in that area in the near future.

Our paper outlines one approach to short-term economic prediction, but we expect that there are several other interesting ideas out there. So we suggest that forecasting wannabes download some Google Trends data and try to relate it to other economic time series. If you find an interesting pattern, post your findings on a website and send a link to econ-forecast@google.com. We'll report on the most interesting results in a later blog post.

It has been said that if you put a million monkeys in front of a million computers, you would eventually produce an accurate economic forecast. Let's see how well that theory works ...

29 comments:

commander spaz said...

Hmm, good idea, but i don't think people like to be called monkeys :-?

Michael Head said...

I'd love to be able to pull the raw/live data into R directly using a CRAN library (instead of going through an intermediate CSV file).

Also, is there full R source for all the plots beyond what's in the appendix?

jeffjarvis said...

And this is why Twitter would be valuable, eh?

Jim Jansen said...

Your comment about time series analysis caught my eye.

Have done research on search logs and time series analysis.

Zhang, Y., Jansen, B. J., Spink, A. (2009) Time Series Analysis of a Web Search Engine Transaction Log, Information Processing & Management. 45(2), 230-245.

http://ist.psu.edu/faculty_pages/jjansen/academic/jansen_time_series_analysis.pdf

BlogMaster said...

This is a real good idea. Seems so simple.

Johanna said...

Maybe it could be a new KPI for stock exchange

Santiago said...

Insert adsense, gain a few bucks, spank the monkeys and pop the corn to watch the experiment...
Interesting way of thinking.

jfleming said...

Million Monkeys posting on Twitter are wanabees. Good work Google, I get the point and I am not in economic forcasting, merely consumer in UK

jhl said...

the reduction of lag between actions and analysis.... the development of a new metric... neither is in the prediction business... but both move the analysis closer to prediction, J

T. Bishop, E.I. said...

Ok the graphs looks similar when comparing the company sales data to the amount of searches on that company, but does this take into account the fact that many people may be searching for a company because it is going bankrupt and in the news, not because people are searching in order to buy?

ercandiyoki said...

The quote, "Prediction is very difficult, especially about the future" actually belongs to Niels Bohr.

Shrewd Steward said...

This is an excellent article. You have what appears to be a spelling error on page 21 of your report, though. "Simple autoregressive models due remarkably well in extrapolating smooth trends." "Due" needs to be "do."

Other than that, this research will prove immensely useful to organizations, researchers and investors. This is one of the ways that we can harness and collate the overload of information coming our way in this day and age.

Robert said...

My first thought on reading this was to wonder how well Google searches really correlate with actual behavior. As other commenters have observed, there are a lot of reasons why people might search for a subject besides actual buying. I've only skimmed the paper, but it looks like the model contains some fittable parameters, and the phrase "cross-validation" doesn't appear anywhere in the text. That isn't a good sign, but I will have to read the procedure more carefully before I make up my mind on that score.

My second thought was, how easily could these statistics be manipulated? If people come to rely on Google Trends and its models for economic forecasting, then the operators of, say, a large bot-net could generate a bunch of bogus searches to create the appearance of a fake recovery in, say, retail sales. Stocks of retailers would presumably surge, creating an opportunity for the perpetrators to profit using short sales or judicious options purchases.

How much thought have you given to quality controlling the data? QC can be a real headache even in cases where the data source is well understood. For example, in meteorology bad ASOS and radiosonde observations sometimes make it into models, and good ones are sometimes erroneously rejected. Both types of error have been known to compromise forecasts, and I would aver that human users are even less predictable than weather instruments. Therefore, I suspect that the QC problem will be a deal-breaker for using Google Trends data as a significant economic indicator.

Ajay Shah said...

Very interesting! I had written a blog post (April 2008) on a different angle: on using google trends to tell us something about the pressures of political economy on alternative corners of the impossible trinity.

Michael F. Martin said...

Dr. Varian, it's really great that Google is making this info available, but there are some format issues that make some uses of it difficult.

For example, I think it would be interesting to see the Fourier transform of some of these time series, since that might help reveal the point of diminishing marginal utility. But with the normalization of the number of searches, it is impossible to put units on a frequency spectrum.

Can I suggest that the raw numbers be made available as a new feature?

damon said...

It's really a powerful tool (or to say data source) to study human behavior, especially the Internet users. 1/6.8 billion ppl use the Internet today.
at the same time, i feel it quite scary, wat if everything will finally be predictable, wat if everything is statistics, will it hinder our creativity?

iTbay said...

Please check out http://itbaycanada.blogspot.com/
with the paper entitled "Using Google Trends for prediciting a new technology paradigm - a Canadian perspective". An unedited report with over a year of work involved.

Scott said...

I think it's important this is not 'predicting the future' as one post stated, but rather a tool to more quickly digest data. It may help 'predict' a monthly sales report that would come out a few weeks after the end of the month, but I don't think we need to worry about predicting the future and the fall of human civilization as we know it.

WJS said...

See this posting on cornering the textbook market online using Google Trends to measure indicators of the market.

http://ideaclearinghouse.blogspot.com/2009/04/cornering-textbook-market-online.html

Rowan said...

It was the physicist Niels Bohr who said "Prediction is very difficult, especially if it's about the future."

Economists still haven't grappled with the fact their so-called scientific discipline has no predictive value of any real usefulness.

Read Nassim Taleb if you disagree.

Bertil Hatt said...

I'm a bit surprised by the tone of the comments: I used to take part in what was described as the harshest econ seminar in town, and it wasn't close to that.

Anyway: great intiative (dreaming about what this could do, making draft and data available, calling for public insights) but I guess not every one can handle such large database. I fought to have means to significant computing facilities, and it's not even half of the problem: if you really care about monkeys typing, why don't you develop a powerful (think R) easy-to use (think Google Spreadsheet) system, and foot the calculation cost. You might even save energy by remembering the most frequent queries.

guppy23 said...

Seems like a good idea. Lets see:

I talk to my peeps in the "east" (Eastern Europe). I need to rent their "botnet" for a couple of months to do millions and millions of queries to google about vacation travel.
Oops, I lost my train of thought. What were we trying to predict again?

Joe said...

im no wizard but this seems far too simplistic... c'mon! about 1 in every 1,000 of my own searches results in a purchase. how many times do you search without the intention of buying? this is garbage. i wish i could get paid to create nonsense models that have no validity.

Thomas said...

Market Prediction has already been done..

mario said...

Good idea! I find this tool very useful. Availability of some custom alerts would be helpful. It may become an interesting source of data for researchers in various fields.

Gord Burtch said...

I guess if trends can be used to predict the spread of the flu, then why not everything else.

Jim Peeke said...

Elliott Wave Theory, as practiced by Robert Prechter, suggests that the future can be predicted by applying Elliott Wave Theory to the volatility of the stock market.

In much the same way, your search inquiries could be the forward looking indicator, instead of the stock market, to which Elliott Wave Theory could be applied.

Brian said...

What about behavioral research. Recently teen pregnancy rates increased in 2006 for the first time since the 1990's. If you look at the trends for teen sex and birth control you see an increase in teen sex around the same time and then it goes back down while birth control stays the same.

huttneab said...

What about using this to find new market niches by trending most searched terms and ranking by least returned results. As in, what are a lot of people looking for but not finding? I know the Search Insight tool doesn't currently do this, but I feel Google could modify it easily.