Predicting phenotype from patterns of annotation

Bioinformatics. 2003:19 Suppl 1:i183-9. doi: 10.1093/bioinformatics/btg1024.


Motivation: Predicting the outcome of specific experiments (such as the growth of a particular mutant strain in a particular medium) has the potential to allow researchers to devote resources to experiments with higher expected numbers of 'hits'.

Results: We use decision trees to predict phenotypes associated with Saccharomyces cerevisiae genes on the basis of Gene Ontology (GO) functional annotations from the Saccharomyces Genome Database (SGD) and other phenotypic annotations from the Yeast Phenotype Catalog at the Munich Information Center for Protein Sequences (MIPS). We assess the methodology in three ways: (1) we use cross-validation on the phenotypic annotations listed in MIPS, and show ROC curves indicating the tradeoff between true-positive rate and false-positive rate; (2) we do a literature-search for 100 of the predicted gene-phenotype associations that are not listed in MIPS, and find evidence for 43 of them; (3) we use deletion strains to experimentally assess 61 predicted gene-phenotype associations not listed in MIPS; significantly more of these deletion strains show abnormal growth than would be expected by chance.

Publication types

  • Comparative Study
  • Evaluation Study
  • Research Support, Non-U.S. Gov't
  • Research Support, U.S. Gov't, Non-P.H.S.
  • Research Support, U.S. Gov't, P.H.S.
  • Validation Study

MeSH terms

  • Algorithms*
  • Artificial Intelligence*
  • Databases, Genetic
  • Documentation*
  • Gene Expression Profiling / methods*
  • Pattern Recognition, Automated
  • Phenotype*
  • Saccharomyces cerevisiae / genetics*
  • Saccharomyces cerevisiae / metabolism
  • Saccharomycetales / classification
  • Saccharomycetales / genetics*
  • Saccharomycetales / metabolism