It’s tough to make predictions, especially about the futureMonday, July 26th, 2010 by Mattia Mialich
I was thinking of a title to introduce the topic of today, and I was reminded of the ironic phrase of the baseball player Yogi Berra. It’s certainly a challenge to imagine what’s to happen down the road, but there are those who try. Every day a number of journalists, marketers, researchers and analysts is working for foretelling us what is to come, seeking to describe the outlook of the future through predictions about economic and social trends that affect us all. Their daily job consists in aggregating and organizing past and present happenings as a network of data linked to the topic, in order to predict the future scenario of anything. The predictive analysis, this is the name of such activity, refers to a set of techniques which belongs to disciplines such as statistics and game theory, through which you can better manage past and current data, to arrive at plausible estimates of future events. In particular, this technique is employed to give future observations and predict the trend of development of particular entities in different areas such as economy, finance, marketing, society, insurance, etc.
Now users have their toy for open source intelligence analysis. It’s a browser-based temporal analytics tool developed by a team of computer scientists, statisticians and linguists for the analysis of large amounts of time-based data from around the web: people, markets, locations and whatever. Recorded Future, a data analytics company headquartered out of Boston, is the interesting start-up that received, a few months ago, investments from Google Ventures for developing this project. Yes, there was still something unknowable for Google, the future. Just over a year ago, in fact, Google was far away from predicting it. However the Big G claimed to predict the present through Google Trends, which as you know provides, together with Google Insights for Search, a daily insight into what the Google users are searching for, by showing the relative volume of search traffic in Google for any search query. An understanding of web search queries offers interesting ramifications for advertisers, marketers, economists, scholars, and anyone else interested in knowing more about his object of study. Obviously some search queries and categories have trends that are quite seasonal, with repeated patterns. See the search trends for “gift” during the Christmas time. Many other search trends, however, are quite irregular and hard to predict. See the search trends for “Facebook”. In the 2009 landmark study Predicting the Presents with Google Trends, Hal Varian, Google’s chief economist, and Hyunyoung Choi describe the forecasting models that learn basic seasonality and general trend, showing how aggregated search trends of Google Categories can be used as extra indicators and effectively leverage several US econometrics prediction models. In the paper they use the frequency of certain search terms to forecast retail, automotive and home sales, as well as travel behavior. We are talking about a gain ranging from a few percent to 18 percent in the “Motor Vehicles and Parts” case.
A small digression, back to Recorded Future. How this engine works is very simple. By continually scanning, through sampling methods and data mining algorithms, thousands of web sources (blogs, reviews, government sites, etc.) for the nature and frequency of references to a certain occurrence, the system computes what they call a “momentum value” for each entity in the database. This means to extract information from text including events, entities and the time when they occur, measuring momentum for each item in the index, as well as sentiment.
All these aspects could be useful for example for the web manager to identify recurring patterns in the accesses or requests for certain pages, delivering useful information about their future trends and getting answers in real time. But also for counterterrorism analysis, and this is already happening, since the system extracts terrorism related events from the public web, as well as info from structured sources like the Institute for the Study of Violent Groups.