Clinical diagnostics in human genetics with semantic similarity searches in ontologies

Am J Hum Genet. 2009 Oct;85(4):457-64. doi: 10.1016/j.ajhg.2009.09.003.


The differential diagnostic process attempts to identify candidate diseases that best explain a set of clinical features. This process can be complicated by the fact that the features can have varying degrees of specificity, as well as by the presence of features unrelated to the disease itself. Depending on the experience of the physician and the availability of laboratory tests, clinical abnormalities may be described in greater or lesser detail. We have adapted semantic similarity metrics to measure phenotypic similarity between queries and hereditary diseases annotated with the use of the Human Phenotype Ontology (HPO) and have developed a statistical model to assign p values to the resulting similarity scores, which can be used to rank the candidate diseases. We show that our approach outperforms simpler term-matching approaches that do not take the semantic interrelationships between terms into account. The advantage of our approach was greater for queries containing phenotypic noise or imprecise clinical descriptions. The semantic network defined by the HPO can be used to refine the differential diagnosis by suggesting clinical features that, if present, best differentiate among the candidate diagnoses. Thus, semantic similarity searches in ontologies represent a useful way of harnessing the semantic structure of human phenotypic abnormalities to help with the differential diagnosis. We have implemented our methods in a freely available web application for the field of human Mendelian disorders.

Publication types

  • Research Support, Non-U.S. Gov't

MeSH terms

  • Computational Biology
  • Databases, Genetic
  • Diagnosis, Differential
  • Genetic Diseases, Inborn / genetics*
  • Genome, Human*
  • Genomics / methods
  • Humans
  • Internet
  • Models, Genetic
  • Models, Statistical
  • Monte Carlo Method
  • Pattern Recognition, Automated / methods
  • Phenotype
  • Software
  • Vocabulary, Controlled