The Detection of Emerging Trends Using Wikipedia Traffic Data and Context Networks

PLoS One. 2015 Dec 31;10(12):e0141892. doi: 10.1371/journal.pone.0141892. eCollection 2015.

Abstract

Can online media predict new and emerging trends, since there is a relationship between trends in society and their representation in online systems? While several recent studies have used Google Trends as the leading online information source to answer corresponding research questions, we focus on the online encyclopedia Wikipedia often used for deeper topical reading. Wikipedia grants open access to all traffic data and provides lots of additional (semantic) information in a context network besides single keywords. Specifically, we suggest and study context-normalized and time-dependent measures for a topic's importance based on page-view time series of Wikipedia articles in different languages and articles related to them by internal links. As an example, we present a study of the recently emerging Big Data market with a focus on the Hadoop ecosystem, and compare the capabilities of Wikipedia versus Google in predicting its popularity and life cycles. To support further applications, we have developed an open web platform to share results of Wikipedia analytics, providing context-rich and language-independent relevance measures for emerging trends.

Publication types

  • Research Support, Non-U.S. Gov't

MeSH terms

  • Humans
  • Internet*
  • Models, Theoretical*

Grants and funding

MK and JK are thankful to the European Union (FP7 ICT project SOCIONICAL, grant 231288) and the German Research Society (DFG, grant KA 1676/4) for financial support. DYK is thankful to the Office of Naval Research (ONR Grant N000141410738) for financial support.