Non-globular domains in protein sequences: automated segmentation using complexity measures

Comput Chem. 1994 Sep;18(3):269-85. doi: 10.1016/0097-8485(94)85023-2.


Computational methods based on mathematically-defined measures of compositional complexity have been developed to distinguish globular and non-globular regions of protein sequences. Compact globular structures in protein molecules are shown to be determined by amino acid sequences of high informational complexity. Sequences of known crystal structure in the Brookhaven Protein Data Bank differ only slightly from randomly shuffled sequences in the distribution of statistical properties such as local compositional complexity. In contrast, in the much larger body of deduced sequences in the SWISS-PROT database, approximately one quarter of the residues occur in segments of non-randomly low complexity and approximately half of the entries contain at least one such segment. Sequences of proteins with known, physicochemically-defined non-globular regions have been analyzed, including collagens, different classes of coiled-coil proteins, elastins, histones, non-histone proteins, mucins, proteoglycan core proteins and proteins containing long single solvent-exposed alpha-helices. The SEG algorithm provides an effective general method for partitioning the globular and non-globular regions of these sequences fully automatically. This method is also facilitating the discovery of new classes of long, non-globular sequence segments, as illustrated by the example of the human CAN gene product involved in tumor induction.

MeSH terms

  • Algorithms
  • Amino Acid Sequence
  • Animals
  • Biometry
  • Crystallization
  • Databases, Factual
  • Humans
  • Molecular Sequence Data
  • Molecular Structure
  • Nuclear Pore Complex Proteins*
  • Nuclear Proteins / chemistry
  • Nuclear Proteins / genetics
  • Protein Structure, Secondary
  • Protein Structure, Tertiary
  • Proteins / chemistry*
  • Proteins / genetics


  • NUP214 protein, human
  • Nuclear Pore Complex Proteins
  • Nuclear Proteins
  • Proteins