Systemic lupus erythematosus with high disease activity identification based on machine learning

Inflamm Res. 2023 Sep;72(9):1909-1918. doi: 10.1007/s00011-023-01793-1. Epub 2023 Sep 19.

Abstract

Objective: Clinical evaluation of systemic lupus erythematosus (SLE) disease activity is limited and inconsistent, and high disease activity significantly, seriously impacts on SLE patients. This study aims to generate a machine learning model to identify SLE patients with high disease activity.

Method: A total of 1014 SLE patients with low disease activity and 453 SLE patients with high disease activity were included. A total of 94 clinical, laboratory data and 17 meteorological indicators were collected. After data preprocessing, we use mutual information and multisurf to evaluate and select the importance of features. The selected features are used for machine learning modeling. Performance of the model is evaluated and verified by a series of binary classification indicators.

Results: We screened out hematuria, proteinuria, pyuria, low complement, precipitation, sunlight and other features for model construction by integrated feature selection. After hyperparameter optimization, the LGB has the best performance (ROC: AUC = 0.930; PRC: AUC = 0.911, APS = 0.913; balance accuracy: 0.856), and the worst is the naive bayes (ROC: AUC = 0.849; PRC: AUC = 0.719, APS = 0.714; balance accuracy: 0.705). Finally, the selection of features has good consistency in the composite feature importance bar plot.

Conclusion: We identify SLE patients with high disease activity by a simple machine learning pipeline, especially the LGB model based on the characteristics of proteinuria, hematuria, pyuria and other feathers screened out by collective feature selection.

Keywords: High disease activity; Meteorological data; SLE; SLEDAI.

MeSH terms

  • Bayes Theorem
  • Hematuria
  • Humans
  • Lupus Erythematosus, Systemic* / diagnosis
  • Machine Learning
  • Proteinuria
  • Pyuria*