Seeing the "Big" Picture: Big Data Methods for Exploring Relationships Between Usage, Language, and Outcome in Internet Intervention Data

J Med Internet Res. 2016 Aug 31;18(8):e241. doi: 10.2196/jmir.5725.


Background: Assessing the efficacy of Internet interventions that are already in the market introduces both challenges and opportunities. While vast, often unprecedented amounts of data may be available (hundreds of thousands, and sometimes millions of participants with high dimensions of assessed variables), the data are observational in nature, are partly unstructured (eg, free text, images, sensor data), do not include a natural control group to be used for comparison, and typically exhibit high attrition rates. New approaches are therefore needed to use these existing data and derive new insights that can augment traditional smaller-group randomized controlled trials.

Objective: Our objective was to demonstrate how emerging big data approaches can help explore questions about the effectiveness and process of an Internet well-being intervention.

Methods: We drew data from the user base of a well-being website and app called Happify. To explore effectiveness, multilevel models focusing on within-person variation explored whether greater usage predicted higher well-being in a sample of 152,747 users. In addition, to explore the underlying processes that accompany improvement, we analyzed language for 10,818 users who had a sufficient volume of free-text response and timespan of platform usage. A topic model constructed from this free text provided language-based correlates of individual user improvement in outcome measures, providing insights into the beneficial underlying processes experienced by users.

Results: On a measure of positive emotion, the average user improved 1.38 points per week (SE 0.01, t122,455=113.60, P<.001, 95% CI 1.36-1.41), about an 11% increase over 8 weeks. Within a given individual user, more usage predicted more positive emotion and less usage predicted less positive emotion (estimate 0.09, SE 0.01, t6047=9.15, P=.001, 95% CI .07-.12). This estimate predicted that a given user would report positive emotion 1.26 points (or 1.26%) higher after a 2-week period when they used Happify daily than during a week when they didn't use it at all. Among highly engaged users, 200 automatically clustered topics showed a significant (corrected P<.001) effect on change in well-being over time, illustrating which topics may be more beneficial than others when engaging with the interventions. In particular, topics that are related to addressing negative thoughts and feelings were correlated with improvement over time.

Conclusions: Using observational analyses on naturalistic big data, we can explore the relationship between usage and well-being among people using an Internet well-being intervention and provide new insights into the underlying mechanisms that accompany it. By leveraging big data to power these new types of analyses, we can explore the workings of an intervention from new angles, and harness the insights that surface to feed back into the intervention and improve it further in the future.

Keywords: big data; linguistic analysis; multilevel modeling; qualitative analysis; well-being intervention; word cloud.

MeSH terms

  • Adolescent
  • Adult
  • Aged
  • Data Collection / methods*
  • Female
  • Humans
  • Internet*
  • Language
  • Male
  • Middle Aged
  • Randomized Controlled Trials as Topic
  • Treatment Outcome
  • Young Adult