Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
. 2020 Feb;18(1):52-64.
doi: 10.1016/j.gpb.2019.08.002. Epub 2020 May 12.

Procleave: Predicting Protease-specific Substrate Cleavage Sites by Combining Sequence and Structural Information

Affiliations

Procleave: Predicting Protease-specific Substrate Cleavage Sites by Combining Sequence and Structural Information

Fuyi Li et al. Genomics Proteomics Bioinformatics. 2020 Feb.

Abstract

Proteases are enzymes that cleave and hydrolyse the peptide bonds between two specific amino acid residues of target substrate proteins. Protease-controlled proteolysis plays a key role in the degradation and recycling of proteins, which is essential for various physiological processes. Thus, solving the substrate identification problem will have important implications for the precise understanding of functions and physiological roles of proteases, as well as for therapeutic target identification and pharmaceutical applicability. Consequently, there is a great demand for bioinformatics methods that can predict novel substrate cleavage events with high accuracy by utilizing both sequence and structural information. In this study, we present Procleave, a novel bioinformatics approach for predicting protease-specific substrates and specific cleavage sites by taking into account both their sequence and 3D structural information. Structural features of known cleavage sites were represented by discrete values using a LOWESS data-smoothing optimization method, which turned out to be critical for the performance of Procleave. The optimal approximations of all structural parameter values were encoded in a conditional random field (CRF) computational framework, alongside sequence and chemical group-based features. Here, we demonstrate the outstanding performance of Procleave through extensive benchmarking and independent tests. Procleave is capable of correctly identifying most cleavage sites in the case study. Importantly, when applied to the human structural proteome encompassing 17,628 protein structures, Procleave suggests a number of potential novel target substrates and their corresponding cleavage sites of different proteases. Procleave is implemented as a webserver and is freely accessible at http://procleave.erc.monash.edu/.

Keywords: Cleavage site prediction; Conditional random field; Machine learning; Protease; Structural determinants.

PubMed Disclaimer

Figures

Figure 1
Figure 1
The overall framework of Procleave There are five major steps in the framework of Procleave, including data pre-processing, feature extraction, model training and optimization, model testing and evaluation, as well as web server development.
Figure 2
Figure 2
Structural determinants of the substrate specificity of nine proteases across the P4P4′ cleavage sites A. Cathepsin D. B. Cathepsin E. C. HIV-1 retropepsin. D. Cathepsin B. E. Caspase-3. F. MMP-2. G. MMP-9. H. Granzyme B (human). I. Cathepsin G. MMP, matrix metallopeptidase. The secondary structure information was extracted from DSSP results. H, helix; E, strand; L, loop.
Figure 3
Figure 3
Performance comparison of CRF models trained using different feature combinations in terms of AUC values A. Cathepsin D. B. Cathepsin E. C. HIV-1 retropepsin. D. Cathepsin B. E. Caspase-3. F. MMP-2. G MMP-9. H. Granzyme B (human). I. Cathepsin G. The evaluation was based on 10 times of 5-fold cross-validation tests on training datasets.
Figure 4
Figure 4
Comparison of cleavage site prediction performance of Procleave and other methods in terms of AUC values for 5 different proteases A. Cathepsin E. B. Caspase-3. C. Caspase-6. D. MMP-2. E. Granzyme B. PoPS, PROSPER, and iProt-Sub cannot predict cleavage sites of cathepsin E; SitePrediction and PROSPER cannot predict cleavage sites of granzyme B. SVM and RF were included to test whether the conditional random field model employed in Procleave provides better performance.
Figure 5
Figure 5
Predicted cleavage sites of four substrate protein structures A. Human αB crystalline (PDB ID: 3L1G, chain: A) cleaved by MMP-9. B. Human Interferon β (PDB ID: 1AU1, chain: A) cleaved by MMP-9. C. ATPase p97 mutant (PDB ID: 3HU2, chain A) cleaved by caspase-6. D. Human enolase 1 (PDB ID: 3B97, chain A) cleaved by meprin β.

Similar articles

Cited by

References

    1. Overall C.M., Blobel C.P. In search of partners: linking extracellular proteases to substrates. Nat Rev Mol Cell Biol. 2007;8:245–257. - PubMed
    1. Turk B. Targeting proteases: successes, failures and future prospects. Nat Rev Drug Discov. 2006;5:785–799. - PubMed
    1. Li F., Wang Y., Li C., Marquez-Lago T.T., Leier A., Rawlings N.D. Twenty years of bioinformatics research for protease-specific substrate and cleavage site prediction: a comprehensive revisit and benchmarking of existing methods. Brief Bioinform. 2018;20:2150–2166. - PMC - PubMed
    1. Li F., Chen J., Leier A., Marquez-Lago T., Liu Q., Wang Y. DeepCleave: a deep learning predictor for caspase and matrix metalloprotease substrates and cleavage sites. Bioinformatics. 2020;36:1057–1065. - PMC - PubMed
    1. Boyd S.E., Pike R.N., Rudy G.B., Whisstock J.C., Garcia de la Banda M. PoPS: a computational tool for modeling and predicting protease specificity. J Bioinform Comput Biol. 2005;3:551–585. - PubMed

Publication types