Toward an Integrated Machine Learning Model of a Proteomics Experiment

J Proteome Res. 2023 Mar 3;22(3):681-696. doi: 10.1021/acs.jproteome.2c00711. Epub 2023 Feb 6.

Abstract

In recent years machine learning has made extensive progress in modeling many aspects of mass spectrometry data. We brought together proteomics data generators, repository managers, and machine learning experts in a workshop with the goals to evaluate and explore machine learning applications for realistic modeling of data from multidimensional mass spectrometry-based proteomics analysis of any sample or organism. Following this sample-to-data roadmap helped identify knowledge gaps and define needs. Being able to generate bespoke and realistic synthetic data has legitimate and important uses in system suitability, method development, and algorithm benchmarking, while also posing critical ethical questions. The interdisciplinary nature of the workshop informed discussions of what is currently possible and future opportunities and challenges. In the following perspective we summarize these discussions in the hope of conveying our excitement about the potential of machine learning in proteomics and to inspire future research.

Keywords: artificial intelligence; deep learning; enzymatic digestion; ion mobility; liquid chromatography; machine learning; research integrity; synthetic data; tandem mass spectrometry.

Publication types

  • Review
  • Research Support, Non-U.S. Gov't

MeSH terms

  • Algorithms
  • Machine Learning*
  • Mass Spectrometry
  • Proteomics* / methods