Objectives: Electronic health records (EHRs) contain a wealth of unstructured patient data that can be leveraged using artificial intelligence (AI). This study aimed to develop a natural language processing (NLP) pipeline to identify clinical phenotypes and disease trajectories in patients with systemic lupus erythematosus (SLE) from EHRs.
Methods: EHR data from SLE patients were included. A standardized stepwise framework combining AI and human intelligence (HI) was designed. Ontology-based definitions were developed for clinical domains, flares and disease complexity phenotypes (low, medium, high) at the first contact, and corresponding data were extracted using an NLP-based pipeline.
Results: Out of 1,000 extraxcted patients, inclusion criteria were met by 262 who had ≥1 hospitalization, ≥1outpatient visit, and a follow-up ≥1.5 years. Among these, 88% were female, median age was 43 years, median follow-up 6 years. At first contact, the most frequently involved clinical domains were hematological (64%), articular (47%), cutaneous (59%) and renal (58%). At first contact, 43% of patients presented with a high-complexity phenotype, 35% medium, 22% low complexity: the first group experienced more flares over time (5 vs 3 and 3, P < 0.001). Patients with a low and medium-complexity phenotype showed a higher increase in new clinical domains and in the use of conventional immunosuppressants, biologics and glucocorticoids during follow-up.
Conclusion: This novel framework, based on real-world data, enables longitudinal phenotype characterization of SLE patients. It demonstrates promise as a feasible tool to study the heterogeneity of SLE and its progression over time, offering insights into potential applications in clinical research and patient management.
Keywords: SLE; artificial intelligence (AI); disease trajectories; electronic health records (EHRs); flares; natural language processing (NLP); therapy.
© The Author(s) 2026. Published by Oxford University Press on behalf of the British Society for Rheumatology. All rights reserved. For commercial re-use, please contact reprints@oup.com for reprints and translation rights for reprints. All other permissions can be obtained through our RightsLink service via the Permissions link on the article page on our site—for further information please contact journals.permissions@oup.com.