Intrinsically disordered loops inserted into the structural domains of human proteins

J Mol Biol. 2006 Jan 27;355(4):845-57. doi: 10.1016/j.jmb.2005.10.037. Epub 2005 Nov 8.

Abstract

Much attention has been paid recently to proteins with partially or fully disordered structures, which are found to exist mostly in eukaryotes and are involved mainly in pivotal cellular processes such as transcriptional regulation, translation and cellular signal transduction. Long disordered sequences are sometimes inserted within the single structural domains of proteins, forming loops from the molecular surface. Such intrinsically disordered loops (IDLs) either are invisible in X-ray crystallography, or hamper protein crystallization itself due to great flexibility. Perhaps because of this, such long disordered sequences have not been characterized adequately. Here, we propose an informational method that stringently identifies IDLs in the structural domains of proteins using the amino acid sequence alone. A genome-wide survey of human proteins conducted with the method identified 50 IDL-containing proteins, several of which have experimentally determined 3D structures. Similar searches in other entirely sequenced organisms revealed that IDLs are prevalent in eukaryotes, while they are much less so in prokaryotes. As there is a statistically significant coincidence between the boundaries of IDLs and those of exons, we suggest that IDLs were produced mainly by exon addition in eukaryotes. IDLs are almost always located at the surface of proteins and are enriched with hydrophilic residues, and IDL-containing proteins tend to be intracellular. Some of the well-characterized proteins with IDLs illustrate that IDLs play pivotal roles in the switching of intracellular signaling or regulatory functions, suggesting that IDL insertion is an effective way to create functionally different domain variants.

Publication types

  • Research Support, Non-U.S. Gov't

MeSH terms

  • Amino Acid Sequence
  • Animals
  • Computational Biology
  • Genome
  • Humans
  • Models, Molecular
  • Molecular Sequence Data
  • Protein Structure, Tertiary
  • Proteins / chemistry*
  • Proteins / genetics
  • Proteins / metabolism*

Substances

  • Proteins