SnapDRAGON: a method to delineate protein structural domains from sequence data

J Mol Biol. 2002 Feb 22;316(3):839-51. doi: 10.1006/jmbi.2001.5387.

Abstract

We describe a method to identify protein domain boundaries from sequence information alone based on the assumption that hydrophobic residues cluster together in space. SnapDRAGON is a suite of programs developed to predict domain boundaries based on the consistency observed in a set of alternative ab initio three-dimensional (3D) models generated for a given protein multiple sequence alignment. This is achieved by running a distance geometry-based folding technique in conjunction with a 3D-domain assignment algorithm. The overall accuracy of our method in predicting the number of domains for a non-redundant data set of 414 multiple alignments, representing 185 single and 231 multiple-domain proteins, is 72.4 %. Using domain linker regions observed in the tertiary structures associated with each query alignment as the standard of truth, inter-domain boundary positions are delineated with an accuracy of 63.9 % for proteins comprising continuous domains only, and 35.4 % for proteins with discontinuous domains. Overall, domain boundaries are delineated with an accuracy of 51.8 %. The prediction accuracy values are independent of the pair-wise sequence similarities within each of the alignments. These results demonstrate the capability of our method to delineate domains in protein sequences associated with a wide variety of structural domain organisation.

Publication types

  • Research Support, Non-U.S. Gov't

MeSH terms

  • Algorithms
  • Computational Biology / methods*
  • Databases, Protein
  • Hydrophobic and Hydrophilic Interactions
  • Models, Molecular
  • Protein Folding
  • Protein Structure, Tertiary*
  • Proteins / chemistry*
  • Proteins / genetics*
  • Reproducibility of Results
  • Sensitivity and Specificity
  • Sequence Alignment
  • Sequence Analysis
  • Software*
  • Statistics as Topic

Substances

  • Proteins