Understanding Editing Behaviors in Multilingual Wikipedia

PLoS One. 2016 May 12;11(5):e0155305. doi: 10.1371/journal.pone.0155305. eCollection 2016.

Abstract

Multilingualism is common offline, but we have a more limited understanding of the ways multilingualism is displayed online and the roles that multilinguals play in the spread of content between speakers of different languages. We take a computational approach to studying multilingualism using one of the largest user-generated content platforms, Wikipedia. We study multilingualism by collecting and analyzing a large dataset of the content written by multilingual editors of the English, German, and Spanish editions of Wikipedia. This dataset contains over two million paragraphs edited by over 15,000 multilingual users from July 8 to August 9, 2013. We analyze these multilingual editors in terms of their engagement, interests, and language proficiency in their primary and non-primary (secondary) languages and find that the English edition of Wikipedia displays different dynamics from the Spanish and German editions. Users primarily editing the Spanish and German editions make more complex edits than users who edit these editions as a second language. In contrast, users editing the English edition as a second language make edits that are just as complex as the edits by users who primarily edit the English edition. In this way, English serves a special role bringing together content written by multilinguals from many language editions. Nonetheless, language remains a formidable hurdle to the spread of content: we find evidence for a complexity barrier whereby editors are less likely to edit complex content in a second language. In addition, we find that multilinguals are less engaged and show lower levels of language proficiency in their second languages. We also examine the topical interests of multilingual editors and find that there is no significant difference between primary and non-primary editors in each language.

Publication types

  • Research Support, Non-U.S. Gov't

MeSH terms

  • Humans
  • Internet*
  • Language
  • Multilingualism*
  • Writing*

Associated data

  • figshare/10.6084/m9.figshare.3218728.v1

Grant support

This work was supported by Korea Ministry of Science and ICT and Future Planning, Grant 10041313, UX-oriented Mobile Software Platform; John Fell Oxford University Press (OUP) Research Fund (https://www.admin.ox.ac.uk/pras/jff/); and University of Oxford’s Economic and Social Research Council Impact Acceleration Account and Higher Education Innovation Fund (HEIF) (http://www.esrc.ac.uk/collaboration/knowledge-exchange/opportunities/ImpactAccelerationAccounts.aspx), reference: IAA/HEIF-DIA-013. The funders had no role in study design, data collection and analysis, decision to publish, or preparation of the manuscript.