Enhancing the Discovery of Functional Post-Translational Modification Sites with Machine Learning Models - Development, Validation, and Interpretation

Methods Mol Biol. 2022:2499:221-260. doi: 10.1007/978-1-0716-2317-6_12.

Abstract

Protein posttranslational modifications (PTMs) are a rapidly expanding feature class of significant importance in cell biology. Due to a high burden of experimental proof, the number of functionals PTMs in the eukaryotic proteome is currently underestimated. Furthermore, not all PTMs are functionally equivalent. Computational approaches that can confidently recommend PTMs of probable function can improve the heuristics of PTM investigation and alleviate these problems. To address this need, we developed SAPH-ire: a multifeature heuristic neural network model that takes community wisdom into account by recommending experimental PTMs similar to those which have previously been established as having regulatory impact. Here, we describe the principle behind the SAPH-ire model, how it is developed, how we evaluate its performance, and important caveats to consider when building and interpreting such models. Finally, we discus current limitations of functional PTM prediction models and highlight potential mechanisms for their improvement.

Keywords: Functional prediction; Machine learning; Mass spectrometry; PTM; Posttranslational modification; Proteins.

Publication types

  • Research Support, N.I.H., Extramural

MeSH terms

  • Machine Learning*
  • Neural Networks, Computer
  • Protein Processing, Post-Translational*
  • Proteome

Substances

  • Proteome