Phenotypic signatures in clinical data enable systematic identification of patients for genetic testing

Nat Med. 2021 Jun;27(6):1097-1104. doi: 10.1038/s41591-021-01356-z. Epub 2021 Jun 3.


Around 5% of the population is affected by a rare genetic disease, yet most endure years of uncertainty before receiving a genetic test. A common feature of genetic diseases is the presence of multiple rare phenotypes that often span organ systems. Here, we use diagnostic billing information from longitudinal clinical data in the electronic health records (EHRs) of 2,286 patients who received a chromosomal microarray test, and 9,144 matched controls, to build a model to predict who should receive a genetic test. The model achieved high prediction accuracies in a held-out test sample (area under the receiver operating characteristic curve (AUROC), 0.97; area under the precision-recall curve (AUPRC), 0.92), in an independent hospital system (AUROC, 0.95; AUPRC, 0.62), and in an independent set of 172,265 patients in which cases were broadly defined as having an interaction with a genetics provider (AUROC, 0.9; AUPRC, 0.63). Patients carrying a putative pathogenic copy number variant were also accurately identified by the model. Compared with current approaches for genetic test determination, our model could identify more patients for testing while also increasing the proportion of those tested who have a genetic disease. We demonstrate that phenotypic patterns representative of a wide range of genetic diseases can be captured from EHRs to systematize decision-making for genetic testing, with the potential to speed up diagnosis, improve care and reduce costs.

Publication types

  • Research Support, N.I.H., Extramural
  • Research Support, Non-U.S. Gov't

MeSH terms

  • Adolescent
  • Adult
  • Child
  • Child, Preschool
  • DNA Copy Number Variations / genetics*
  • Electronic Health Records
  • Female
  • Genetic Diseases, Inborn / diagnosis*
  • Genetic Diseases, Inborn / pathology
  • Genetic Testing*
  • Humans
  • Infant
  • Male
  • Microarray Analysis
  • Phenotype
  • Rare Diseases / diagnosis*
  • Rare Diseases / genetics
  • Rare Diseases / pathology