Leveraging electronic health records data to predict multiple sclerosis disease activity

Ann Clin Transl Neurol. 2021 Apr;8(4):800-810. doi: 10.1002/acn3.51324. Epub 2021 Feb 24.


Objective: No relapse risk prediction tool is currently available to guide treatment selection for multiple sclerosis (MS). Leveraging electronic health record (EHR) data readily available at the point of care, we developed a clinical tool for predicting MS relapse risk.

Methods: Using data from a clinic-based research registry and linked EHR system between 2006 and 2016, we developed models predicting relapse events from the registry in a training set (n = 1435) and tested the model performance in an independent validation set of MS patients (n = 186). This iterative process identified prior 1-year relapse history as a key predictor of future relapse but ascertaining relapse history through the labor-intensive chart review is impractical. We pursued two-stage algorithm development: (1) L1 -regularized logistic regression (LASSO) to phenotype past 1-year relapse status from contemporaneous EHR data, (2) LASSO to predict future 1-year relapse risk using imputed prior 1-year relapse status and other algorithm-selected features.

Results: The final model, comprising age, disease duration, and imputed prior 1-year relapse history, achieved a predictive AUC and F score of 0.707 and 0.307, respectively. The performance was significantly better than the baseline model (age, sex, race/ethnicity, and disease duration) and noninferior to a model containing actual prior 1-year relapse history. The predicted risk probability declined with disease duration and age.

Conclusion: Our novel machine-learning algorithm predicts 1-year MS relapse with accuracy comparable to other clinical prediction tools and has applicability at the point of care. This EHR-based two-stage approach of outcome prediction may have application to neurological disease beyond MS.

Publication types

  • Research Support, N.I.H., Extramural