Machine Learning of Patient Characteristics to Predict Admission Outcomes in the Undiagnosed Diseases Network

JAMA Netw Open. 2021 Feb 1;4(2):e2036220. doi: 10.1001/jamanetworkopen.2020.36220.


Importance: The Undiagnosed Diseases Network (UDN) is a national network that evaluates individual patients whose signs and symptoms have been refractory to diagnosis. Providing reliable estimates of admission outcomes may assist clinical evaluators to distinguish, prioritize, and accelerate admission to the UDN for patients with undiagnosed diseases.

Objective: To develop computational models that effectively predict admission outcomes for applicants seeking UDN evaluation and to rank the applications based on the likelihood of patient admission to the UDN.

Design, setting, and participants: This prognostic study included all applications submitted to the UDN from July 2014 to June 2019, with 1209 applications accepted and 1212 applications not accepted. The main inclusion criterion was an undiagnosed condition despite thorough evaluation by a health care professional; the main exclusion criteria were a diagnosis that explained the objective findings or a review of the records that suggested a diagnosis. A classifier was trained using information extracted from application forms, referral letters from health care professionals, and semantic similarity between referral letters and textual description of known mendelian disorders. The admission labels were provided by the case review committee of the UDN. In addition to retrospective analysis, the classifier was prospectively tested on another 288 applications that were not evaluated at the time of classifier development.

Main outcomes and measures: The primary outcomes were whether a patient was accepted or not accepted to the UDN and application order ranked based on likelihood of admission. The performance of the classifier was assessed by comparing its predictions against the UDN admission outcomes and by measuring improvement in the mean processing time for accepted applications.

Results: The best classifier obtained sensitivity of 0.843, specificity of 0.738, and area under the receiver operating characteristic curve of 0.844 for predicting admission outcomes among 1212 accepted and 1210 not accepted applications. In addition, the classifier can decrease the current mean (SD) UDN processing time for accepted applications from 3.29 (3.17) months to 1.05 (3.82) months (68% improvement) by ordering applications based on their likelihood of acceptance.

Conclusions and relevance: A classification system was developed that may assist clinical evaluators to distinguish, prioritize, and accelerate admission to the UDN for patients with undiagnosed diseases. Accelerating the admission process may improve the diagnostic journeys for these patients and serve as a model for partial automation of triaging or referral for other resource-constrained applications. Such classification models make explicit some of the considerations that currently inform the use of whole-genome sequencing for undiagnosed disease and thereby invite a broader discussion in the clinical genetics community.

Publication types

  • Research Support, N.I.H., Extramural

MeSH terms

  • Adolescent
  • Adult
  • Area Under Curve
  • Child
  • Child, Preschool
  • Computer Simulation
  • Female
  • Humans
  • Infant
  • Infant, Newborn
  • Machine Learning*
  • Male
  • Middle Aged
  • Patient Admission
  • Patient Selection*
  • Prospective Studies
  • ROC Curve
  • Rare Diseases / diagnosis*
  • Rare Diseases / genetics
  • Referral and Consultation*
  • Reproducibility of Results
  • Retrospective Studies
  • Triage
  • Undiagnosed Diseases / diagnosis*
  • Undiagnosed Diseases / genetics
  • Whole Genome Sequencing
  • Young Adult