Objective: Electronic health records (EHRs) represent powerful tools to study rare diseases. Our objective was to develop and validate EHR algorithms to identify systemic lupus erythematosus (SLE) births across centers.
Methods: We developed algorithms in a training set using an EHR with over 3 million subjects and validated the algorithms at 2 other centers. Subjects at all 3 centers were selected using ≥1 code for SLE International Classification of Diseases, Ninth Revision (ICD-9) or SLE International Statistical Classification of Diseases and Related Health Problems, Tenth Revision, Clinical Modification (ICD-10-CM) and ≥1 ICD-9 or ICD-10-CM delivery code. A subject was a case if diagnosed with SLE by a rheumatologist and had a birth documented. We tested algorithms using SLE ICD-9 or ICD-10-CM codes, antimalarial use, a positive antinuclear antibody ≥1:160, and ever checked double-stranded DNA or complement, using both rule-based and machine learning methods. Positive predictive values (PPVs) and sensitivities were calculated. We assessed the impact of case definition, coding provider, and subject race on algorithm performance.
Results: Algorithms performed similarly across all 3 centers. Increasing the number of SLE codes, adding clinical data, and having a rheumatologist use the SLE code all increased the likelihood of identifying true SLE patients. All the algorithms had higher PPVs in African American versus White SLE births. Using machine learning methods, the total number of SLE codes and an SLE code from a rheumatologist were the most important variables in the model for SLE case status.
Conclusion: We developed and validated algorithms that use multiple types of data to identify SLE births in the EHR. Algorithms performed better in African American mothers than in White mothers.
© 2020 American College of Rheumatology.