Can a Single Variable Predict Early Dropout From Digital Health Interventions? Comparison of Predictive Models From Two Large Randomized Trials

J Med Internet Res. 2023 Jan 20;25:e43629. doi: 10.2196/43629.


Background: A single generalizable metric that accurately predicts early dropout from digital health interventions has the potential to readily inform intervention targets and treatment augmentations that could boost retention and intervention outcomes. We recently identified a type of early dropout from digital health interventions for smoking cessation, specifically, users who logged in during the first week of the intervention and had little to no activity thereafter. These users also had a substantially lower smoking cessation rate with our iCanQuit smoking cessation app compared with users who used the app for longer periods.

Objective: This study aimed to explore whether log-in count data, using standard statistical methods, can precisely predict whether an individual will become an iCanQuit early dropout while validating the approach using other statistical methods and randomized trial data from 3 other digital interventions for smoking cessation (combined randomized N=4529).

Methods: Standard logistic regression models were used to predict early dropouts for individuals receiving the iCanQuit smoking cessation intervention app, the National Cancer Institute QuitGuide smoking cessation intervention app, the smoking cessation intervention website, and the smoking cessation intervention website. The main predictors were the number of times a participant logged in per day during the first 7 days following randomization. The area under the curve (AUC) assessed the performance of the logistic regression models, which were compared with decision trees, support vector machine, and neural network models. We also examined whether 13 baseline variables that included a variety of demographics (eg, race and ethnicity, gender, and age) and smoking characteristics (eg, use of e-cigarettes and confidence in being smoke free) might improve this prediction.

Results: The AUC for each logistic regression model using only the first 7 days of log-in count variables was 0.94 (95% CI 0.90-0.97) for iCanQuit, 0.88 (95% CI 0.83-0.93) for QuitGuide, 0.85 (95% CI 0.80-0.88) for, and 0.60 (95% CI 0.54-0.66) for Replacing logistic regression models with more complex decision trees, support vector machines, or neural network models did not significantly increase the AUC, nor did including additional baseline variables as predictors. The sensitivity and specificity were generally good, and they were excellent for iCanQuit (ie, 0.91 and 0.85, respectively, at the 0.5 classification threshold).

Conclusions: Logistic regression models using only the first 7 days of log-in count data were generally good at predicting early dropouts. These models performed well when using simple, automated, and readily available log-in count data, whereas including self-reported baseline variables did not improve the prediction. The results will inform the early identification of people at risk of early dropout from digital health interventions with the goal of intervening further by providing them with augmented treatments to increase their retention and, ultimately, their intervention outcomes.

Keywords: ACT; QuitGuide; acceptance and commitment therapy; attrition; digital interventions; dropout; eHealth; engagement; iCanQuit; mHealth; mobile health; mobile phone; smartphone apps; smoking; tobacco; trajectories.

MeSH terms

  • Electronic Nicotine Delivery Systems*
  • Humans
  • Mobile Applications*
  • Randomized Controlled Trials as Topic
  • Self Report
  • Smoking Cessation* / methods