Metagenomic Taxonomy-Guided Database-Searching Strategy for Improving Metaproteomic Analysis

J Proteome Res. 2018 Apr 6;17(4):1596-1605. doi: 10.1021/acs.jproteome.7b00894. Epub 2018 Feb 26.


Metaproteomics provides a direct measure of the functional information by investigating all proteins expressed by a microbiota. However, due to the complexity and heterogeneity of microbial communities, it is very hard to construct a sequence database suitable for a metaproteomic study. Using a public database, researchers might not be able to identify proteins from poorly characterized microbial species, while a sequencing-based metagenomic database may not provide adequate coverage for all potentially expressed protein sequences. To address this challenge, we propose a metagenomic taxonomy-guided database-search strategy (MT), in which a merged database is employed, consisting of both taxonomy-guided reference protein sequences from public databases and proteins from metagenome assembly. By applying our MT strategy to a mock microbial mixture, about two times as many peptides were detected as with the metagenomic database only. According to the evaluation of the reliability of taxonomic attribution, the rate of misassignments was comparable to that obtained using an a priori matched database. We also evaluated the MT strategy with a human gut microbial sample, and we found 1.7 times as many peptides as using a standard metagenomic database. In conclusion, our MT strategy allows the construction of databases able to provide high sensitivity and precision in peptide identification in metaproteomic studies, enabling the detection of proteins from poorly characterized species within the microbiota.

Keywords: mass spectrometry; metagenomics; metaproteomics; microbial communities; taxonomy.

Publication types

  • Research Support, Non-U.S. Gov't

MeSH terms

  • Classification / methods
  • Computer Simulation
  • Data Mining / methods*
  • Databases, Protein
  • Metagenomics / standards*
  • Microbiota*
  • Proteins / analysis*
  • Proteomics / standards*


  • Proteins