A computational approach to qualitative analysis in large textual datasets

Michael S Evans

doi:10.1371/journal.pone.0087908

A computational approach to qualitative analysis in large textual datasets

PLoS One. 2014 Feb 3;9(2):e87908. doi: 10.1371/journal.pone.0087908. eCollection 2014.

Author

Michael S Evans¹

Affiliation

¹ Neukom Institute for Computational Science and Department of Film & Media Studies, Dartmouth College, Hanover, New Hampshire, United States of America.

Abstract

In this paper I introduce computational techniques to extend qualitative analysis into the study of large textual datasets. I demonstrate these techniques by using probabilistic topic modeling to analyze a broad sample of 14,952 documents published in major American newspapers from 1980 through 2012. I show how computational data mining techniques can identify and evaluate the significance of qualitatively distinct subjects of discussion across a wide range of public discourse. I also show how examining large textual datasets with computational methods can overcome methodological limitations of conventional qualitative methods, such as how to measure the impact of particular cases on broader discourse, how to validate substantive inferences from small samples of textual data, and how to determine if identified cases are part of a consistent temporal pattern.

Publication types

Research Support, Non-U.S. Gov't

MeSH terms

Books*
Computational Biology / methods*
Data Mining / methods
Databases, Factual*
Evaluation Studies as Topic*
Humans
Models, Statistical*

Grants and funding

Research support was provided by the Neukom Institute for Computational Science. The funders had no role in study design, data collection and analysis, decision to publish, or preparation of the manuscript.