A computational approach to qualitative analysis in large textual datasets

PLoS One. 2014 Feb 3;9(2):e87908. doi: 10.1371/journal.pone.0087908. eCollection 2014.

Abstract

In this paper I introduce computational techniques to extend qualitative analysis into the study of large textual datasets. I demonstrate these techniques by using probabilistic topic modeling to analyze a broad sample of 14,952 documents published in major American newspapers from 1980 through 2012. I show how computational data mining techniques can identify and evaluate the significance of qualitatively distinct subjects of discussion across a wide range of public discourse. I also show how examining large textual datasets with computational methods can overcome methodological limitations of conventional qualitative methods, such as how to measure the impact of particular cases on broader discourse, how to validate substantive inferences from small samples of textual data, and how to determine if identified cases are part of a consistent temporal pattern.

Publication types

  • Research Support, Non-U.S. Gov't

MeSH terms

  • Books*
  • Computational Biology / methods*
  • Data Mining / methods
  • Databases, Factual*
  • Evaluation Studies as Topic*
  • Humans
  • Models, Statistical*

Grant support

Research support was provided by the Neukom Institute for Computational Science. The funders had no role in study design, data collection and analysis, decision to publish, or preparation of the manuscript.