Identification of clinical phenotypes and disease trajectories in SLE using AI through a natural language processing framework

Rheumatology (Oxford). 2026 Feb 4;65(2):keag035. doi: 10.1093/rheumatology/keag035.

Abstract

Objectives: Electronic health records (EHRs) contain a wealth of unstructured patient data that can be leveraged using artificial intelligence (AI). This study aimed to develop a natural language processing (NLP) pipeline to identify clinical phenotypes and disease trajectories in patients with systemic lupus erythematosus (SLE) from EHRs.

Methods: EHR data from SLE patients were included. A standardized stepwise framework combining AI and human intelligence (HI) was designed. Ontology-based definitions were developed for clinical domains, flares and disease complexity phenotypes (low, medium, high) at the first contact, and corresponding data were extracted using an NLP-based pipeline.

Results: Out of 1,000 extraxcted patients, inclusion criteria were met by 262 who had ≥1 hospitalization, ≥1outpatient visit, and a follow-up ≥1.5 years. Among these, 88% were female, median age was 43 years, median follow-up 6 years. At first contact, the most frequently involved clinical domains were hematological (64%), articular (47%), cutaneous (59%) and renal (58%). At first contact, 43% of patients presented with a high-complexity phenotype, 35% medium, 22% low complexity: the first group experienced more flares over time (5 vs 3 and 3, P < 0.001). Patients with a low and medium-complexity phenotype showed a higher increase in new clinical domains and in the use of conventional immunosuppressants, biologics and glucocorticoids during follow-up.

Conclusion: This novel framework, based on real-world data, enables longitudinal phenotype characterization of SLE patients. It demonstrates promise as a feasible tool to study the heterogeneity of SLE and its progression over time, offering insights into potential applications in clinical research and patient management.

Keywords: SLE; artificial intelligence (AI); disease trajectories; electronic health records (EHRs); flares; natural language processing (NLP); therapy.

MeSH terms

  • Adult
  • Artificial Intelligence*
  • Disease Progression
  • Electronic Health Records*
  • Female
  • Humans
  • Lupus Erythematosus, Systemic* / diagnosis
  • Lupus Erythematosus, Systemic* / physiopathology
  • Male
  • Middle Aged
  • Natural Language Processing*
  • Phenotype

Grants and funding