On Filtering False Positive Transmembrane Protein Predictions

Protein Eng. 2002 Sep;15(9):745-52. doi: 10.1093/protein/15.9.745.

Abstract

While helical transmembrane (TM) region prediction tools achieve high (>90%) success rates for real integral membrane proteins, they produce a considerable number of false positive hits in sequences of known nontransmembrane queries. We propose a modification of the dense alignment surface (DAS) method that achieves a substantial decrease in the false positive error rate. Essentially, a sequence that includes possible transmembrane regions is compared in a second step with TM segments in a sequence library of documented transmembrane proteins. If the performance of the query sequence against the library of documented TM segment-containing sequences in this test is lower than an empirical threshold, it is classified as a non-transmembrane protein. The probability of false positive prediction for trusted TM region hits is expressed in terms of E-values. The modified DAS method, the DAS-TMfilter algorithm, has an unchanged high sensitivity for TM segments ( approximately 95% detected in a learning set of 128 documented transmembrane proteins). At the same time, the selectivity measured over a non-redundant set of 526 soluble proteins with known 3D structure is approximately 99%, mainly because a large number of falsely predicted single membrane-pass proteins are eliminated by the DAS-TMfilter algorithm.

Publication types

  • Research Support, Non-U.S. Gov't

MeSH terms

  • Algorithms
  • Computational Biology
  • Databases, Protein
  • Membrane Proteins / chemistry*
  • Membrane Proteins / genetics
  • Protein Engineering
  • Sequence Alignment / statistics & numerical data

Substances

  • Membrane Proteins