PhenoApt leverages clinical expertise to prioritize candidate genes via machine learning

Am J Hum Genet. 2022 Feb 3;109(2):270-281. doi: 10.1016/j.ajhg.2021.12.008. Epub 2022 Jan 20.


In recent years, exome sequencing (ES) has shown great utility in the diagnoses of Mendelian disorders. However, after rigorous filtering, a typical ES analysis still involves the interpretation of hundreds of variants, which greatly hinders the rapid identification of causative genes. Since the interpretations of ES data require comprehensive clinical analyses, taking clinical expertise into consideration can speed the molecular diagnoses of Mendelian disorders. To leverage clinical expertise to prioritize candidate genes, we developed PhenoApt, a phenotype-driven gene prioritization tool that allows users to assign a customized weight to each phenotype, via a machine-learning algorithm. Using the ability to rank causative genes in top-10 lists as an evaluation metric, baseline analysis demonstrated that PhenoApt outperformed previous phenotype-driven gene prioritization tools by a relative increase of 22.7%-140.0% in three independent, real-world, multi-center cohorts (cohort 1, n = 185; cohort 2, n = 784; and cohort 3, n = 208). Additional trials showed that, by adding weights to clinical indications, which should be explained by the causative gene, PhenoApt performance was improved by a relative increase of 37.3% in cohort 2 (n = 471) and 21.4% in cohort 3 (n = 208). Moreover, PhenoApt could assign an intrinsic weight to each phenotype based on the likelihood of its being a Mendelian trait using term frequency-inverse document frequency techniques. When clinical indications were assigned with intrinsic weights, PhenoApt performance was improved by a relative increase of 23.7% in cohort 2 and 15.5% in cohort 3. For the integration of PhenoApt into clinical practice, we developed a user-friendly website and a command-line tool.

Keywords: Mendelian disorders; data analysis; exome sequencing; gene prioritization; machine learning; phenotype-driven analysis.

Publication types

  • Multicenter Study
  • Research Support, Non-U.S. Gov't

MeSH terms

  • Cohort Studies
  • Computational Biology
  • Databases, Genetic
  • Exome
  • Exome Sequencing
  • Genetic Diseases, Inborn / diagnosis
  • Genetic Diseases, Inborn / genetics*
  • Genetic Diseases, Inborn / pathology
  • Genetic Testing
  • Genotype
  • Hearing Loss, Sensorineural / diagnosis
  • Hearing Loss, Sensorineural / genetics*
  • Hearing Loss, Sensorineural / pathology
  • Humans
  • Intellectual Disability / diagnosis
  • Intellectual Disability / genetics*
  • Intellectual Disability / pathology
  • Machine Learning*
  • Microcephaly / diagnosis
  • Microcephaly / genetics*
  • Microcephaly / pathology
  • Nystagmus, Congenital / diagnosis
  • Nystagmus, Congenital / genetics*
  • Nystagmus, Congenital / pathology
  • Phenotype
  • Scoliosis / diagnosis
  • Scoliosis / genetics*
  • Scoliosis / pathology
  • Software