Software-aided workflow for predicting protease-specific cleavage sites using physicochemical properties of the natural and unnatural amino acids in peptide-based drug discovery

PLoS One. 2019 Jan 8;14(1):e0199270. doi: 10.1371/journal.pone.0199270. eCollection 2019.

Abstract

Peptide drugs have been used in the treatment of multiple pathologies. During peptide discovery, it is crucially important to be able to map the potential sites of cleavages of the proteases. This knowledge is used to later chemically modify the peptide drug to adapt it for the therapeutic use, making peptide stable against individual proteases or in complex medias. In some other cases it needed to make it specifically unstable for some proteases, as peptides could be used as a system to target delivery drugs on specific tissues or cells. The information about proteases, their sites of cleavages and substrates are widely spread across publications and collected in databases such as MEROPS. Therefore, it is possible to develop models to improve the understanding of the potential peptide drug proteolysis. We propose a new workflow to derive protease specificity rules and predict the potential scissile bonds in peptides for individual proteases. WebMetabase stores the information from experimental or external sources in a chemically aware database where each peptide and site of cleavage is represented as a sequence of structural blocks connected by amide bonds and characterized by its physicochemical properties described by Volsurf descriptors. Thus, this methodology could be applied in the case of non-standard amino acid. A frequency analysis can be performed in WebMetabase to discover the most frequent cleavage sites. These results were used to train several models using logistic regression, support vector machine and ensemble tree classifiers to map cleavage sites for several human proteases from four different families (serine, cysteine, aspartic and matrix metalloproteases). Finally, we compared the predictive performance of the developed models with other available public tools PROSPERous and SitePrediction.

Publication types

  • Research Support, Non-U.S. Gov't

MeSH terms

  • Amino Acids / chemistry*
  • Drug Discovery*
  • Endopeptidases / chemistry*
  • Humans
  • Peptides / chemistry*
  • Sequence Analysis, Protein / instrumentation
  • Sequence Analysis, Protein / methods*
  • Software*
  • Workflow

Substances

  • Amino Acids
  • Peptides
  • Endopeptidases

Grants and funding

The funders had no role in study design, data collection and analysis, decision to publish, or preparation of the manuscript. This work was supported by the Secretary of Universities and Research of the Department of Economy and Knowledge of the Generalitat de Catalunya through the Industrial Doctorate program of the Government of Catalunya (http://doctoratsindustrials.gencat.cat/es) and funded all travel expenses related to this research. Luca Morettoni is employed by Molecular Discovery Ltd. Molecular Discovery Ltd provided support in the form of salaries for author LM but did not have any additional role in the study design, data collection and analysis, decision to publish, or preparation of the manuscript. The specific roles of these authors are articulated in the 'author contributions' section. Lead Molecular Design S.L. provided support in the form of salaries for authors [TR, IZ, FF] but did not have any additional role in the study design, data collection and analysis, decision to publish, or preparation of the manuscript. The specific roles of these authors are articulated in the ‘author contributions’ section.