Validation of a machine learning approach to estimate Systemic Lupus Erythematosus Disease Activity Index score categories and application in a real-world dataset

RMD Open. 2021 May;7(2):e001586. doi: 10.1136/rmdopen-2021-001586.

Abstract

Objective: Use of the Systemic Lupus Erythematosus Disease Activity Index (SLEDAI) in routine clinical practice is inconsistent, and availability of clinician-recorded SLEDAI scores in real-world datasets is limited. This study aimed to validate a machine learning model to estimate SLEDAI score categories using clinical notes and to apply the model to a large, real-world dataset to generate estimated score categories for use in future research studies.

Methods: A machine learning model was developed to estimate an individual patient's SLEDAI score category (no activity, mild activity, moderate activity or high/very high activity) for a specific encounter date using clinical notes. A training cohort of 3504 encounters and a separate validation cohort of 1576 encounters were created from the OM1 SLE Registry. Model performance was assessed using the area under the receiver operating characteristic curve (AUC), calculated using a binarised version of the outcome that sets the positive class to be those records with clinician-recorded SLEDAI scores >5 and the negative class to be records with scores ≤5. Model performance was evaluated by categorising the scores into the four disease activity categories and by calculating the Spearman's R value and Pearson's R value.

Results: The AUC for the two categories was 0.93 for the development cohort and 0.91 for the validation cohort. The model had a Spearman's R value of 0.7 and a Pearson's R value of 0.7 when calculated using the four disease activity categories.

Conclusion: The model performs well when estimating SLEDAI score categories using unstructured clinical notes.

Keywords: epidemiology; healthcare; lupus erythematosus; outcome assessment; systemic.

MeSH terms

  • Cohort Studies
  • Humans
  • Lupus Erythematosus, Systemic* / diagnosis
  • Lupus Erythematosus, Systemic* / epidemiology
  • Machine Learning
  • ROC Curve
  • Severity of Illness Index