Development and validation of algorithms to build an electronic health record based cohort of patients with systemic sclerosis

PLoS One. 2023 Apr 13;18(4):e0283775. doi: 10.1371/journal.pone.0283775. eCollection 2023.

Abstract

Objectives: To evaluate methods of identifying patients with systemic sclerosis (SSc) using International Classification of Diseases, Tenth Revision (ICD-10) codes (M34*), electronic health record (EHR) databases and organ involvement keywords, that result in a validated cohort comprised of true cases with high disease burden.

Methods: We retrospectively studied patients in a healthcare system likely to have SSc. Using structured EHR data from January 2016 to June 2021, we identified 955 adult patients with M34* documented 2 or more times during the study period. A random subset of 100 patients was selected to validate the ICD-10 code for its positive predictive value (PPV). The dataset was then divided into a training and validation sets for unstructured text processing (UTP) search algorithms, two of which were created using keywords for Raynaud's syndrome, and esophageal involvement/symptoms.

Results: Among 955 patients, the average age was 60. Most patients (84%) were female; 75% of patients were White, and 5.2% were Black. There were approximately 175 patients per year with the code newly documented, overall 24% had an ICD-10 code for esophageal disease, and 13.4% for pulmonary hypertension. The baseline PPV was 78%, which improved to 84% with UTP, identifying 788 patients likely to have SSc. After the ICD-10 code was placed, 63% of patients had a rheumatology office visit. Patients identified by the UTP search algorithm were more likely to have increased healthcare utilization (ICD-10 codes 4 or more times 84.1% vs 61.7%, p < .001), organ involvement (pulmonary hypertension 12.7% vs 6% p = .011) and medication use (mycophenolate use 28.7% vs 11.4%, p < .001) than those identified by the ICD codes alone.

Conclusion: EHRs can be used to identify patients with SSc. Using unstructured text processing keyword searches for SSc clinical manifestations improved the PPV of ICD-10 codes alone and identified a group of patients most likely to have SSc and increased healthcare needs.

Publication types

  • Research Support, N.I.H., Extramural

MeSH terms

  • Adult
  • Algorithms
  • Databases, Factual
  • Electronic Health Records
  • Female
  • Humans
  • Hypertension, Pulmonary*
  • International Classification of Diseases
  • Male
  • Middle Aged
  • Reproducibility of Results
  • Retrospective Studies
  • Scleroderma, Systemic* / epidemiology
  • Uridine Triphosphate

Substances

  • Uridine Triphosphate