FIEFDom: a transparent domain boundary recognition system using a fuzzy mean operator

Nucleic Acids Res. 2009 Feb;37(2):452-62. doi: 10.1093/nar/gkn944. Epub 2008 Dec 4.

Abstract

Protein domain prediction is often the preliminary step in both experimental and computational protein research. Here we present a new method to predict the domain boundaries of a multidomain protein from its amino acid sequence using a fuzzy mean operator. Using the nr-sequence database together with a reference protein set (RPS) containing known domain boundaries, the operator is used to assign a likelihood value for each residue of the query sequence as belonging to a domain boundary. This procedure robustly identifies contiguous boundary regions. For a dataset with a maximum sequence identity of 30%, the average domain prediction accuracy of our method is 97% for one domain proteins and 58% for multidomain proteins. The presented model is capable of using new sequence/structure information without re-parameterization after each RPS update. When tested on a current database using a four year old RPS and on a database that contains different domain definitions than those used to train the models, our method consistently yielded the same accuracy while two other published methods did not. A comparison with other domain prediction methods used in the CASP7 competition indicates that our method performs better than existing sequence-based methods.

Publication types

  • Comparative Study
  • Evaluation Study
  • Research Support, U.S. Gov't, Non-P.H.S.

MeSH terms

  • Databases, Protein
  • Protein Structure, Tertiary*
  • Sequence Analysis, Protein / methods*
  • Sequence Homology, Amino Acid
  • Software*