Function prediction of uncharacterized proteins

J Bioinform Comput Biol. 2007 Feb;5(1):1-30. doi: 10.1142/s0219720007002503.


Function prediction of uncharacterized protein sequences generated by genome projects has emerged as an important focus for computational biology. We have categorized several approaches beyond traditional sequence similarity that utilize the overwhelmingly large amounts of available data for computational function prediction, including structure-, association (genomic context)-, interaction (cellular context)-, process (metabolic context)-, and proteomics-experiment-based methods. Because they incorporate structural and experimental data that is not used in sequence-based methods, they can provide additional accuracy and reliability to protein function prediction. Here, first we review the definition of protein function. Then the recent developments of these methods are introduced with special focus on the type of predictions that can be made. The need for further development of comprehensive systems biology techniques that can utilize the ever-increasing data presented by the genomics and proteomics communities is emphasized. For the readers' convenience, tables of useful online resources in each category are included. The role of computational scientists in the near future of biological research and the interplay between computational and experimental biology are also addressed.

Publication types

  • Research Support, N.I.H., Extramural
  • Review

MeSH terms

  • Algorithms
  • Amino Acid Sequence
  • Binding Sites
  • Chromosome Mapping / methods*
  • Molecular Sequence Data
  • Protein Binding
  • Protein Interaction Mapping / methods*
  • Proteins / chemistry*
  • Proteins / genetics
  • Proteins / metabolism*
  • Sequence Analysis, Protein / methods*
  • Signal Transduction / physiology*
  • Structure-Activity Relationship*


  • Proteins