An Algorithm for Creating Virtual Controls Using Integrated and Harmonized Longitudinal Data

Eval Health Prof. 2018 Jun;41(2):183-215. doi: 10.1177/0163278718772882. Epub 2018 May 3.

Abstract

We introduce a strategy for creating virtual control groups-cases generated through computer algorithms that, when aggregated, may serve as experimental comparators where live controls are difficult to recruit, such as when programs are widely disseminated and randomization is not feasible. We integrated and harmonized data from eight archived longitudinal adolescent-focused data sets spanning the decades from 1980 to 2010. Collectively, these studies examined numerous psychosocial variables and assessed past 30-day alcohol, cigarette, and marijuana use. Additional treatment and control group data from two archived randomized control trials were used to test the virtual control algorithm. Both randomized controlled trials (RCTs) assessed intentions, normative beliefs, and values as well as past 30-day alcohol, cigarette, and marijuana use. We developed an algorithm that used percentile scores from the integrated data set to create age- and gender-specific latent psychosocial scores. The algorithm matched treatment case observed psychosocial scores at pretest to create a virtual control case that figuratively "matured" based on age-related changes, holding the virtual case's percentile constant. Virtual controls matched treatment case occurrence, eliminating differential attrition as a threat to validity. Virtual case substance use was estimated from the virtual case's latent psychosocial score using logistic regression coefficients derived from analyzing the treatment group. Averaging across virtual cases created group estimates of prevalence. Two criteria were established to evaluate the adequacy of virtual control cases: (1) virtual control group pretest drug prevalence rates should match those of the treatment group and (2) virtual control group patterns of drug prevalence over time should match live controls. The algorithm successfully matched pretest prevalence for both RCTs. Increases in prevalence were observed, although there were discrepancies between live and virtual control outcomes. This study provides an initial framework for creating virtual controls using a step-by-step procedure that can now be revised and validated using other prevention trial data.

Keywords: adolescents; alcohol; cigarettes; control groups; harmonization; integrated data analysis; marijuana; missing data imputation; psychosocial mediators.

Publication types

  • Randomized Controlled Trial
  • Research Support, N.I.H., Extramural

MeSH terms

  • Adolescent
  • Alcoholism / psychology
  • Algorithms*
  • Child
  • Computer Simulation*
  • Data Interpretation, Statistical
  • Female
  • Humans
  • Longitudinal Studies
  • Male
  • Marijuana Abuse / psychology
  • Research Design*
  • Substance-Related Disorders / psychology*
  • Tobacco Use Disorder / psychology