LineageFilter: Improved Proteotyping of Complex Samples Using Metaproteomics and Machine Learning

J Proteome Res. 2024 Nov 1;23(11):5203-5208. doi: 10.1021/acs.jproteome.4c00184. Epub 2024 Oct 19.

Abstract

Metaproteomics is a powerful tool to characterize how microbiota function by analyzing their proteic content by tandem mass spectrometry. Given the complexity of these samples, accurately assessing their taxonomical composition without prior information based solely on peptide sequences remains a challenge. Here, we present LineageFilter, a new python-based AI software for refined proteotyping of complex samples using metaproteomics interpreted data and machine learning. Given a tentative list of taxa, their abundances, and the scores associated with their identified peptides, LineageFilter computes a comprehensive set of features for each identified taxon at all taxonomical ranks. Its machine-learning model then assesses the likelihood of each taxon's presence based on these features, enabling improved proteotyping and sample-specific database construction.

Keywords: machine learning; metaproteomics; microbiomes; proteotyping; taxonomy.

MeSH terms

  • Humans
  • Machine Learning*
  • Microbiota / genetics
  • Peptides / analysis
  • Peptides / chemistry
  • Proteomics* / methods
  • Software*
  • Tandem Mass Spectrometry*

Substances

  • Peptides