Evaluation of an automated genome interpretation model for rare disease routinely used in a clinical genetic laboratory

Genet Med. 2023 Jun;25(6):100830. doi: 10.1016/j.gim.2023.100830. Epub 2023 Mar 16.


Purpose: The analysis of exome and genome sequencing data for the diagnosis of rare diseases is challenging and time-consuming. In this study, we evaluated an artificial intelligence model, based on machine learning for automating variant prioritization for diagnosing rare genetic diseases in the Baylor Genetics clinical laboratory.

Methods: The automated analysis model was developed using a supervised learning approach based on thousands of manually curated variants. The model was evaluated on 2 cohorts. The model accuracy was determined using a retrospective cohort comprising 180 randomly selected exome cases (57 singletons, 123 trios); all of which were previously diagnosed and solved through manual interpretation. Diagnostic yield with the modified workflow was estimated using a prospective "production" cohort of 334 consecutive clinical cases.

Results: The model accurately pinpointed all manually reported variants as candidates. The reported variants were ranked in top 10 candidate variants in 98.4% (121/123) of trio cases, in 93.0% (53/57) of single proband cases, and 96.7% (174/180) of all cases. The accuracy of the model was reduced in some cases because of incomplete variant calling (eg, copy number variants) or incomplete phenotypic description.

Conclusion: The automated model for case analysis assists clinical genetic laboratories in prioritizing candidate variants effectively. The use of such technology may facilitate the interpretation of genomic data for a large number of patients in the era of precision medicine.

Keywords: Clinical genomics; Machine learning methods.

MeSH terms

  • Artificial Intelligence
  • Exome / genetics
  • Humans
  • Laboratories
  • Laboratories, Clinical*
  • Prospective Studies
  • Rare Diseases* / diagnosis
  • Rare Diseases* / genetics
  • Retrospective Studies