A comparative study of logistic regression based machine learning techniques for prediction of early virological suppression in antiretroviral initiating HIV patients

BMC Med Inform Decis Mak. 2018 Sep 4;18(1):77. doi: 10.1186/s12911-018-0659-x.


Background: Treatment with effective antiretroviral therapy (ART) lowers morbidity and mortality among HIV positive individuals. Effective highly active antiretroviral therapy (HAART) should lead to undetectable viral load within 6 months of initiation of therapy. Failure to achieve and maintain viral suppression may lead to development of resistance and increase the risk of viral transmission. In this paper three logistic regression based machine learning approaches are developed to predict early virological outcomes using easily measurable baseline demographic and clinical variables (age, body weight, sex, TB disease status, ART regimen, viral load, CD4 count). The predictive performance and generalizability of the approaches are compared.

Methods: The multitask temporal logistic regression (MTLR), patient specific survival prediction (PSSP) and simple logistic regression (SLR) models were developed and validated using the IDI research cohort data and predictive performance tested on an external dataset from the EFV cohort. The model calibration and discrimination plots, discriminatory measures (AUROC, F1) and overall predictive performance (brier score) were assessed.

Results: The MTLR model outperformed the PSSP and SLR models in terms of goodness of fit (RMSE = 0.053, 0.1, and 0.14 respectively), discrimination (AUROC = 0.92, 0.75 and 0.53 respectively) and general predictive performance (Brier score= 0.08, 0.19, 0.11 respectively). The predictive importance of variables varied with time after initiation of ART. The final MTLR model accurately (accuracy = 92.9%) predicted outcomes in the external (EFV cohort) dataset with satisfactory discrimination (0.878) and a low (6.9%) false positive rate.

Conclusion: Multitask Logistic regression based models are capable of accurately predicting early virological suppression using readily available baseline demographic and clinical variables and could be used to derive a risk score for use in resource limited settings.

Keywords: L2-regularization; Logistic regression; Machine learning; Multitask temporal logistic regression; Patient specific survival prediction; Prediction; Viral suppression.

Publication types

  • Comparative Study
  • Research Support, Non-U.S. Gov't

MeSH terms

  • Adult
  • Anti-HIV Agents / therapeutic use*
  • Antiretroviral Therapy, Highly Active
  • CD4 Lymphocyte Count
  • Cohort Studies
  • Female
  • HIV Infections / drug therapy*
  • Humans
  • Logistic Models*
  • Machine Learning*
  • Male
  • Predictive Value of Tests
  • Treatment Outcome
  • Viral Load


  • Anti-HIV Agents