Identification of structural motifs from protein coordinate data: secondary structure and first-level supersecondary structure

Proteins. 1988;3(2):71-84. doi: 10.1002/prot.340030202.


A computer program is described that produces a description of the secondary structure and supersecondary structure of a polypeptide chain using the list of alpha carbon coordinates as input. Restricting the term "secondary structure" to the conformation of contiguous segments of the chain, the program determines the initial and final residues in helices, extended strands, sharp turns, and omega loops. This is accomplished through the use of difference distance matrices. The distances in idealized models of the segments are compared with the actual structure, and the differences are evaluated for agreement within preset limits. The program assigns 90-95% of the residues in most proteins to at least one type of secondary element. In a second step the now-defined helices and strands are idealized as straight line segments, and the axial directions and locations are compiled from the input C alpha coordinate list. These data are used to check for moderate curvature in strands and helices, and the secondary structure list is corrected where necessary. The geometric relations between these line segments are then calculated and output as the first level of supersecondary structure. A maximum of six parameters are required for a complete description of the relations between each pair. Frequently a less complete description will suffice, for example just the interaxial separation and angle. Both the secondary structure and one aspect of the supersecondary structure can be displayed in a character matrix analogous to the distance matrix format. This allows a quite accurate two-dimensional display of the three-dimensional structure, and several examples are presented. A procedure for searching for arbitrary substructures in proteins using distance matrices is also described. A search for the DNA binding helix-turn-helix motif in the Protein Data Bank serves as an example. A further abstraction of the above data can be made in the form of a metamatrix where each diagonal element represents an entire secondary segment rather than a single atom, and the off-diagonal elements contain all the parameters describing their interrelations. Such matrices can be used in a straightforward search for higher levels of supersecondary structure or used in toto as a representation of the entire tertiary structure of the polypeptide chain.

Publication types

  • Research Support, U.S. Gov't, P.H.S.

MeSH terms

  • Computer Simulation
  • Models, Molecular
  • Protein Conformation*
  • Proteins*
  • Software


  • Proteins