Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
. 2021 Oct 19;120(20):4312-4319.
doi: 10.1016/j.bpj.2021.08.039. Epub 2021 Sep 2.

Metapredict: a fast, accurate, and easy-to-use predictor of consensus disorder and structure

Affiliations

Metapredict: a fast, accurate, and easy-to-use predictor of consensus disorder and structure

Ryan J Emenecker et al. Biophys J. .

Abstract

Intrinsically disordered proteins and protein regions make up a substantial fraction of many proteomes in which they play a wide variety of essential roles. A critical first step in understanding the role of disordered protein regions in biological function is to identify those disordered regions correctly. Computational methods for disorder prediction have emerged as a core set of tools to guide experiments, interpret results, and develop hypotheses. Given the multiple different predictors available, consensus scores have emerged as a popular approach to mitigate biases or limitations of any single method. Consensus scores integrate the outcome of multiple independent disorder predictors and provide a per-residue value that reflects the number of tools that predict a residue to be disordered. Although consensus scores help mitigate the inherent problems of using any single disorder predictor, they are computationally expensive to generate. They also necessitate the installation of multiple different software tools, which can be prohibitively difficult. To address this challenge, we developed a deep-learning-based predictor of consensus disorder scores. Our predictor, metapredict, utilizes a bidirectional recurrent neural network trained on the consensus disorder scores from 12 proteomes. By benchmarking metapredict using two orthogonal approaches, we found that metapredict is among the most accurate disorder predictors currently available. Metapredict is also remarkably fast, enabling proteome-scale disorder prediction in minutes. Importantly, metapredict is a fully open source and is distributed as a Python package, a collection of command-line tools, and a web server, maximizing the potential practical utility of the predictor. We believe metapredict offers a convenient, accessible, accurate, and high-performance predictor for single-proteins and proteomes alike.

PubMed Disclaimer

Figures

Figure 1
Figure 1
Overview of metapredict. Consensus scores are taken from 420,660 proteins distributed across 12 proteomes. Metapredict was developed by training a bidirectional recurrent neural network (BRNN) on this data, leading to a set of network weights that allow the prediction of any possible consensus sequence score.
Figure 2
Figure 2
Evaluation of metapredict using CAID experiments. (A) F1-score for various predictors in examining their accuracy in predicting protein disorder from the DisProt dataset. (B) F1-scores for various predictors in examining their accuracy in predicting protein disorder from the DisProt-PDB dataset. (C) F1-scores for various predictors in predicting fully disordered proteins in the DisProt dataset. Values for all predictors in (A)–(C) with the exception of those for metapredict (orange bar) were obtained from (27).
Figure 3
Figure 3
Accuracy and performance of metapredict. (A) Rank order of predictors in terms of number of correct residues per 100, assessed using true positive and true negative only (Disprot-PDB dataset). (B) Relative execution time for all predictors as evaluated in CAID over 652 independent sequences. Metapredict emerges as the third fastest predictor with a relative average loss in accuracy of two residues per 100 compared with the state-of-the-art (see also Fig. S11.)
Figure 4
Figure 4
Metapredict also offers predicted structure confidence based on AlphaFold2. (A) Comparison of predicted pLDDT (blue) versus actual pLDDT for the translational termination factor Sup35 from Schizosaccharomyces pombe. This sequence was not used in the training data, and is provided as a simple illustrative example of the agreement between the metapredict-derived prediction and actual AlphaFold2 pLDDT values (B). Comparison of disorder (red), predicted pLDDT (ppLDDT; divided by 100 to place it on the same scale), and known folded domains (gray and blue) with associated Protein Data Bank IDs shown for the human RNA-binding protein TDP-43. The C-terminal disordered region is an experimentally verified IDR (63). Disorder and ppLDDT scores are anticorrelated and correctly identify domain boundaries.

Similar articles

Cited by

References

    1. Sormanni P., Piovesan D., Vendruscolo M. Simultaneous quantification of protein order and disorder. Nat. Chem. Biol. 2017;13:339–342. - PubMed
    1. Bottaro S., Lindorff-Larsen K. Biophysical experiments and biomolecular simulations: a perfect match? Science. 2018;361:355–360. - PubMed
    1. Henzler-Wildman K., Kern D. Dynamic personalities of proteins. Nature. 2007;450:964–972. - PubMed
    1. van der Lee R., Buljan M., Babu M.M. Classification of intrinsically disordered regions and proteins. Chem. Rev. 2014;114:6589–6631. - PMC - PubMed
    1. Wright P.E., Dyson H.J. Intrinsically unstructured proteins: re-assessing the protein structure-function paradigm. J. Mol. Biol. 1999;293:321–331. - PubMed

Publication types

Substances

LinkOut - more resources