Objectives: The Veterans Affairs (VA) Health Care System is among the largest integrated health systems in the United States. Many VA enrollees are dual users of Medicare, and little research has examined methods to most accurately predict which veterans will be mostly reliant on VA services in the future. This study examined whether machine learning methods can better predict future reliance on VA primary care compared with traditional statistical methods.
Study design: Observational study of 83,143 VA patients dually enrolled in fee-for-service Medicare using VA and Medicare administrative databases and the 2012 Survey of Healthcare Experiences of Patients.
Methods: The primary outcome was a dichotomous measure denoting whether patients obtained more than 50% of all primary care visits (VA + Medicare) from VA. We compared the performance of 6 candidate models-logistic regression, elastic net regression, decision trees, random forest, gradient boosting machine, and neural network-in predicting 2013 reliance as a function of 61 patient characteristics observed in 2012. We measured performance using the cross-validated area under the receiver operating characteristic (AUROC) metric.
Results: Overall, 72.9% and 74.5% of veterans were mostly VA reliant in 2012 and 2013, respectively. All models had similar average AUROCs, ranging from 0.873 to 0.892. The best-performing model used gradient boosting machine, which exhibited modestly higher AUROC and similar variance compared with standard logistic regression.
Conclusions: The modest gains in performance from the best-performing model, gradient boosting machine, are unlikely to outweigh inherent drawbacks, including computational complexity and limited interpretability compared with traditional logistic regression.