Variable gap penalty for protein sequence-structure alignment

Protein Eng Des Sel. 2006 Mar;19(3):129-33. doi: 10.1093/protein/gzj005. Epub 2006 Jan 19.


The penalty for inserting gaps into an alignment between two protein sequences is a major determinant of the alignment accuracy. Here, we present an algorithm for finding a globally optimal alignment by dynamic programming that can use a variable gap penalty (VGP) function of any form. We also describe a specific function that depends on the structural context of an insertion or deletion. It penalizes gaps that are introduced within regions of regular secondary structure, buried regions, straight segments and also between two spatially distant residues. The parameters of the penalty function were optimized on a set of 240 sequence pairs of known structure, spanning the sequence identity range of 20-40%. We then tested the algorithm on another set of 238 sequence pairs of known structures. The use of the VGP function increases the number of correctly aligned residues from 81.0 to 84.5% in comparison with the optimized affine gap penalty function; this difference is statistically significant according to Student's t-test. We estimate that the new algorithm allows us to produce comparative models with an additional approximately 7 million accurately modeled residues in the approximately 1.1 million proteins that are detectably related to a known structure.

Publication types

  • Comparative Study
  • Research Support, N.I.H., Extramural

MeSH terms

  • Algorithms
  • Amino Acid Sequence
  • Models, Molecular
  • Molecular Sequence Data
  • Proteins / chemistry*
  • Sequence Alignment / methods*
  • Sequence Analysis, Protein / methods*
  • Sequence Homology, Amino Acid*
  • Software*


  • Proteins