Genomic classification of protein-coding gene families

WormBook. 2005 Sep 23;1-23. doi: 10.1895/wormbook.1.29.1.


This chapter reviews analytical tools currently in use for protein classification, and gives an overview of the C. elegans proteome. Computational analysis of proteins relies heavily on hidden Markov models of protein families. Proteins can also be classified by predicted secondary or tertiary structures, hydrophobic profiles, compositional biases, or size ranges. Strictly orthologous protein families remain difficult to identify, except by skilled human labor. The InterPro and NCBI KOG classifications encompass 79% of C. elegans protein-coding genes; in both classifications, a small number of protein families account for a disproportionately large number of genes. C. elegans protein-coding genes include at least approximately 12,000 orthologs of C. briggsae genes, and at least approximately 4,400 orthologs of non-nematode eukaryotic genes. Some metazoan proteins conserved in other nematodes are absent from C. elegans. Conversely, 9% of C. elegans protein-coding genes are conserved among all metazoa or eukaryotes, yet have no known functions.

Publication types

  • Research Support, N.I.H., Extramural
  • Review

MeSH terms

  • Animals
  • Caenorhabditis elegans / genetics*
  • Caenorhabditis elegans Proteins / classification
  • Caenorhabditis elegans Proteins / genetics*
  • Caenorhabditis elegans Proteins / physiology
  • Evolution, Molecular
  • Genes, Helminth*
  • Humans
  • Multigene Family*
  • Proteome / genetics*


  • Caenorhabditis elegans Proteins
  • Proteome