Sequence variations within protein families are linearly related to structural variations

J Mol Biol. 2002 Oct 25;323(3):551-62. doi: 10.1016/s0022-2836(02)00971-3.

Abstract

It is commonly believed that similarities between the sequences of two proteins infer similarities between their structures. Sequence alignments reliably recognize pairs of protein of similar structures provided that the percentage sequence identity between their two sequences is sufficiently high. This distinction, however, is statistically less reliable when the percentage sequence identity is lower than 30% and little is known then about the detailed relationship between the two measures of similarity. Here, we investigate the inverse correlation between structural similarity and sequence similarity on 12 protein structure families. We define the structure similarity between two proteins as the cRMS distance between their structures. The sequence similarity for a pair of proteins is measured as the mean distance between the sequences in the subsets of sequence space compatible with their structures. We obtain an approximation of the sequence space compatible with a protein by designing a collection of protein sequences both stable and specific to the structure of that protein. Using these measures of sequence and structure similarities, we find that structural changes within a protein family are linearly related to changes in sequence similarity.

Publication types

  • Research Support, U.S. Gov't, Non-P.H.S.
  • Research Support, U.S. Gov't, P.H.S.

MeSH terms

  • Amino Acid Sequence
  • Bacterial Proteins / chemistry
  • Bacterial Proteins / genetics
  • Humans
  • Models, Molecular
  • Molecular Sequence Data
  • Protein Conformation*
  • Protein Folding
  • Protein Structure, Secondary
  • Protein Structure, Tertiary*
  • Proteins / chemistry*
  • Proteins / genetics*
  • Sequence Alignment
  • Statistics as Topic
  • src Homology Domains

Substances

  • Bacterial Proteins
  • Ig L-binding protein, Peptostreptococcus
  • IgG Fc-binding protein, Streptococcus
  • Proteins

Associated data

  • PDB/1BVD
  • PDB/1LHT
  • PDB/1MYT
  • PDB/1PGB
  • PDB/1VXB
  • PDB/2GDM
  • PDB/2HBG
  • PDB/2PTL
  • PDB/3SDHA
  • PDB/5MBN