Objective: To accelerate the use of outcome measures in rheumatology, we developed and evaluated a natural language processing (NLP) pipeline for extracting these measures from free-text outpatient rheumatology notes within the American College of Rheumatology's Rheumatology Informatics System for Effectiveness (RISE) registry.
Methods: We included all patients in RISE (2015-2018). The NLP pipeline extracted scores corresponding to 8 measures of rheumatoid arthritis (RA) disease activity (DA) and functional status (FS) documented in outpatient rheumatology notes. Score extraction performance was evaluated by chart review, and we assessed agreement with scores documented in structured data. We conducted an external validation of our NLP pipeline using data from rheumatology notes from an academic medical center that is not included in the RISE registry.
Results: We processed over 34 million notes from 854,628 patients, 158 practices, and 24 electronic health record (EHR) systems from RISE. Manual chart review revealed a sensitivity, positive predictive value (PPV), and F1 score of 95%, 87%, and 91%, respectively. Substantial agreement was observed between scores extracted from RISE notes and scores derived from structured data (κ = 0.43-0.68 among DA and 0.86-0.98 among FS measures). In the external validation, we found a sensitivity, PPV, and F1 score of 92%, 69%, and 79%, respectively.
Conclusion: We developed an NLP pipeline to extract RA outcome measures from a national registry of notes from multiple EHR systems and found it to have good internal and external validity. This pipeline can facilitate measurement of clinical- and patient-reported outcomes for use in research and quality measurement.
© 2022 American College of Rheumatology.