Multi-domain proteins in the three kingdoms of life: orphan domains and other unassigned regions

J Mol Biol. 2005 Apr 22;348(1):231-43. doi: 10.1016/j.jmb.2005.02.007.

Abstract

Comparative studies of the proteomes from different organisms have provided valuable information about protein domain distribution in the kingdoms of life. Earlier studies have been limited by the fact that only about 50% of the proteomes could be matched to a domain. Here, we have extended these studies by including less well-defined domain definitions, Pfam-B and clustered domains, MAS, in addition to Pfam-A and SCOP domains. It was found that a significant fraction of these domain families are homologous to Pfam-A or SCOP domains. Further, we show that all regions that do not match a Pfam-A or SCOP domain contain a significantly higher fraction of disordered structure. These unstructured regions may be contained within orphan domains or function as linkers between structured domains. Using several different definitions we have re-estimated the number of multi-domain proteins in different organisms and found that several methods all predict that eukaryotes have approximately 65% multi-domain proteins, while the prokaryotes consist of approximately 40% multi-domain proteins. However, these numbers are strongly dependent on the exact choice of cut-off for domains in unassigned regions. In conclusion, all eukaryotes have similar fractions of multi-domain proteins and disorder, whereas a high fraction of repeating domain is distinguished only in multicellular eukaryotes. This implies a role for repeats in cell-cell contacts while the other two features are important for intracellular functions.

Publication types

  • Research Support, Non-U.S. Gov't

MeSH terms

  • Amino Acid Sequence
  • Animals
  • Archaeal Proteins / chemistry
  • Bacterial Proteins / chemistry
  • Databases, Protein
  • Humans
  • Molecular Sequence Data
  • Protein Structure, Tertiary*
  • Proteins / chemistry*
  • Proteome*
  • Sequence Homology, Amino Acid

Substances

  • Archaeal Proteins
  • Bacterial Proteins
  • Proteins
  • Proteome