Using alcohol consumption diary data from an internet intervention for outcome and predictive modeling: a validation and machine learning study

BMC Med Res Methodol. 2020 May 11;20(1):111. doi: 10.1186/s12874-020-00995-z.


Background: Alcohol use disorder (AUD) is highly prevalent and presents a large treatment gap. Self-help internet interventions are an attractive approach to lowering thresholds for seeking help and disseminating evidence-based programs at scale. Internet interventions for AUD however suffer from high attrition and since continuous outcome measurements are uncommon, little is known about trajectories and processes. The current study investigates whether data from a non-mandatory alcohol consumption diary, common in internet interventions for AUD, approximates drinks reported at follow-up, and whether data from the first half of the intervention predict treatment success.

Methods: N = 607 participants enrolled in a trial of online self-help for AUD, made an entry in the non-mandatory consumption diary (total of 9117 entries), and completed the follow-up assessment. Using multiple regression and a subset of calendar data overlapping with the follow-up, scaling factors were derived to account for missing entries per participant and week. Generalized estimating equations with an inverse time predictor were then used to calculate point-estimates of drinks per week at follow-up, the confidence intervals of which were compared to that from the measurement at follow-up. Next, calendar data form the first half of the intervention were retained and summary functions used to create 18 predictors for random forest machine learning models, the classification accuracies of which were ultimately estimated using nested cross-validation.

Results: While the raw calendar data substantially underestimated drinks reported at follow-up, the confidence interval of the trajectory-derived point-estimate from the adjusted data overlapped with the confidence interval of drinks reported at follow-up. Machine learning models achieved prediction accuracies of 64% (predicting non-hazardous drinking) and 48% (predicting AUD severity decrease), in both cases with higher sensitivity than specificity.

Conclusions: Data from a non-mandatory alcohol consumption diary, adjusted for missing entries, approximates follow-up data at a group level, suggesting that such data can be used to reveal trajectories and processes during treatment and possibly be used to impute missing follow-up data. At an individual level, however, calendar data from the first half of the intervention did not have high predictive accuracy, presumable due to a high rate of missing data and unclear missing mechanisms.

Keywords: Alcohol; Calendar; Classification; Diary; Machine learning; Measurement; Prediction.

Publication types

  • Research Support, Non-U.S. Gov't

MeSH terms

  • Alcohol Drinking
  • Alcoholism* / diagnosis
  • Alcoholism* / therapy
  • Health Behavior
  • Humans
  • Internet
  • Internet-Based Intervention*
  • Machine Learning