Development and validation of an automated HIV prediction algorithm to identify candidates for pre-exposure prophylaxis: a modelling study

Lancet HIV. 2019 Oct;6(10):e696-e704. doi: 10.1016/S2352-3018(19)30139-0. Epub 2019 Jul 5.


Background: HIV pre-exposure prophylaxis (PrEP) is effective but underused, in part because clinicians do not have the tools to identify PrEP candidates. We developed and validated an automated prediction algorithm that uses electronic health record (EHR) data to identify individuals at increased risk for HIV acquisition.

Methods: We used machine learning algorithms to predict incident HIV infections with 180 potential predictors of HIV risk drawn from EHR data from 2007-15 at Atrius Health, an ambulatory group practice in Massachusetts, USA. We included EHRs of all patients aged 15 years or older with at least one clinical encounter during 2007-15. We used ten-fold cross-validated area under the receiver operating characteristic curve (cv-AUC) with 95% CIs to assess the model's performance at identifying individuals with incident HIV and patients independently prescribed PrEP by clinicians. The best-performing model was validated prospectively with 2016 data from Atrius Health and externally with 2011-16 data from Fenway Health, a community health centre specialising in sexual health care in Boston (MA, USA). We calculated HIV risk scores (ie, probability of an incident HIV diagnosis) for every HIV-uninfected patient not on PrEP during 2007-15 at Atrius Health and assessed the distribution of scores for thresholds to determine possible candidates for PrEP in the three study cohorts.

Findings: We included 1 155 966 Atrius Health patients from 2007-15 (150 [<0·1%] patients with incident HIV) in our development cohort, 537 257 Atrius Health patients in 2016 (16 [<0·1%] with incident HIV) in our prospective validation cohort, and 33 404 Fenway Health patients from 2011-16 (423 [1·3%] with incident HIV) in our external validation cohort. The best-performing algorithm was obtained with least absolute shrinkage and selection operator (LASSO) and had a cv-AUC of 0·86 (95% CI 0·82-0·90) for identification of incident HIV infections in the development cohort, 0·91 (0·81-1·00) on prospective validation, and 0·77 (0·74-0·79) on external validation. The LASSO model successfully identified patients independently prescribed PrEP by clinicians at Atrius Health in 2016 (cv-AUC 0·93, 95% CI 0·90-0·96) or Fenway Health (0·79, 0·78-0·80). HIV risk scores increased steeply at the 98th percentile. Using this score as a threshold, we prospectively identified 9515 (1·8%) of 536 384 patients at Atrius Health in 2016 and 4385 (15·3%) of 28 702 Fenway Health patients as potential PrEP candidates.

Interpretation: Automated algorithms can efficiently identify patients at increased risk for HIV acquisition. Integrating these models into EHRs to alert providers about patients who might benefit from PrEP could improve prescribing and prevent new HIV infections.

Funding: Harvard University Center for AIDS Research, Providence/Boston Center for AIDS Research, Rhode Island IDeA-CTR, the National Institute of Mental Health, and the US Centers for Disease Control and Prevention.

Publication types

  • Evaluation Study
  • Research Support, N.I.H., Extramural
  • Research Support, Non-U.S. Gov't

MeSH terms

  • Adolescent
  • Adult
  • Algorithms*
  • Anti-HIV Agents / therapeutic use
  • Electronic Health Records
  • Female
  • HIV Infections / prevention & control*
  • Humans
  • Male
  • Middle Aged
  • Pre-Exposure Prophylaxis / methods*
  • Prospective Studies
  • Young Adult


  • Anti-HIV Agents