Early prediction of movie box office success based on Wikipedia activity big data

PLoS One. 2013 Aug 21;8(8):e71226. doi: 10.1371/journal.pone.0071226. eCollection 2013.


Use of socially generated "big data" to access information about collective states of the minds in human societies has become a new paradigm in the emerging field of computational social science. A natural application of this would be the prediction of the society's reaction to a new product in the sense of popularity and adoption rate. However, bridging the gap between "real time monitoring" and "early predicting" remains a big challenge. Here we report on an endeavor to build a minimalistic predictive model for the financial success of movies based on collective activity data of online users. We show that the popularity of a movie can be predicted much before its release by measuring and analyzing the activity level of editors and viewers of the corresponding entry to the movie in Wikipedia, the well-known online encyclopedia.

Publication types

  • Research Support, Non-U.S. Gov't

MeSH terms

  • Behavior
  • Data Collection
  • Forecasting*
  • Humans
  • Internet*
  • Linear Models
  • Mass Media*
  • Models, Statistical
  • Motion Pictures / trends*
  • Software
  • Time Factors

Grant support

Partial financial support from EU's 7th Framework Program's FET-Open to ICTeCollective project no. 238597 and by the Academy of Finland, the Finnish Center of Excellence program, project no. 129670, and TEKES (FiDiPro) are gratefully acknowledged. The funders had no role in study design, data collection and analysis, decision to publish, or preparation of the manuscript.