Missing gene identification using functional coherence scores

Sci Rep. 2016 Aug 24:6:31725. doi: 10.1038/srep31725.


Reconstructing metabolic and signaling pathways is an effective way of interpreting a genome sequence. A challenge in a pathway reconstruction is that often genes in a pathway cannot be easily found, reflecting current imperfect information of the target organism. In this work, we developed a new method for finding missing genes, which integrates multiple features, including gene expression, phylogenetic profile, and function association scores. Particularly, for considering function association between candidate genes and neighboring proteins to the target missing gene in the network, we used Co-occurrence Association Score (CAS) and PubMed Association Score (PAS), which are designed for capturing functional coherence of proteins. We showed that adding CAS and PAS substantially improve the accuracy of identifying missing genes in the yeast enzyme-enzyme network compared to the cases when only the conventional features, gene expression, phylogenetic profile, were used. Finally, it was also demonstrated that the accuracy improves by considering indirect neighbors to the target enzyme position in the network using a proper network-topology-based weighting scheme.

Publication types

  • Research Support, N.I.H., Extramural
  • Research Support, Non-U.S. Gov't

MeSH terms

  • Algorithms
  • Chromosome Mapping
  • Computational Biology / methods*
  • Computer Simulation
  • Enzymes / chemistry
  • Fungal Proteins / chemistry
  • Gene Expression Profiling*
  • Gene Expression Regulation, Fungal*
  • Gene Regulatory Networks*
  • Genomics / methods*
  • Models, Statistical
  • Phylogeny
  • Probability
  • Reproducibility of Results
  • Saccharomyces cerevisiae / enzymology
  • Saccharomyces cerevisiae / genetics*


  • Enzymes
  • Fungal Proteins