Predicting protein secondary structure and solvent accessibility with an improved multiple linear regression method

Proteins. 2005 Nov 15;61(3):473-80. doi: 10.1002/prot.20645.

Abstract

We have improved the multiple linear regression (MLR) algorithm for protein secondary structure prediction by combining it with the evolutionary information provided by multiple sequence alignment of PSI-BLAST. On the CB513 dataset, the three states average overall per-residue accuracy, Q(3), reached 76.4%, while segment overlap accuracy, SOV99, reached 73.2%, using a rigorous jackknife procedure and the strictest reduction of eight states DSSP definition to three states. This represents an improvement of approximately 5% on overall per-residue accuracy compared with previous work. The relative solvent accessibility prediction also benefited from this combination of methods. The system achieved 77.7% average jackknifed accuracy for two states prediction based on a 25% relative solvent accessibility mode, with a Mathews' correlation coefficient of 0.548. The improved MLR secondary structure and relative solvent accessibility prediction server is available at http://spg.biosci.tsinghua.edu.cn/.

Publication types

  • Research Support, Non-U.S. Gov't

MeSH terms

  • Algorithms
  • Computational Biology / methods*
  • Linear Models
  • Protein Structure, Secondary
  • Proteins / chemistry*
  • Regression Analysis
  • Reproducibility of Results
  • Solvents / chemistry*

Substances

  • Proteins
  • Solvents