Discovering the Ultimate Limits of Protein Secondary Structure Prediction

Biomolecules. 2021 Nov 3;11(11):1627. doi: 10.3390/biom11111627.

Abstract

Secondary structure prediction (SSP) of proteins is an important structural biology technique with many applications. There have been ~300 algorithms published in the past seven decades with fierce competition in accuracy. In the first 60 years, the accuracy of three-state SSP rose from ~56% to 81%; after that, it has long stayed at 81-86%. In the 1990s, the theoretical limit of three-state SSP accuracy had been estimated to be 88%. Thus, SSP is now generally considered not challenging or too challenging to improve. However, we found that the limit of three-state SSP might be underestimated. Besides, there is still much room for improving segment-based and eight-state SSPs, but the limits of these emerging topics have not been determined. This work performs large-scale sequence and structural analyses to estimate SSP accuracy limits and assess state-of-the-art SSP methods. The limit of three-state SSP is re-estimated to be ~92%, 4-5% higher than previously expected, indicating that SSP is still challenging. The estimated limit of eight-state SSP is 84-87%. Several proposals for improving future SSP algorithms are made based on our results. We hope that these findings will help move forward the development of SSP and all its applications.

Keywords: protein secondary structure prediction; protein sequence; protein sequence-based predictions; protein structure; structural biology.

Publication types

  • Research Support, Non-U.S. Gov't

MeSH terms

  • Algorithms
  • Computational Biology*
  • Protein Structure, Secondary
  • Proteins*

Substances

  • Proteins