Background: Peptide spectrum matching (PSM) is the standard method in shotgun proteomics data analysis. It relies on the availability of an accurate and complete sample proteome that is used to make interpretation of the spectra feasible. Although this procedure has proven to be effective in many proteomics studies, the approach has limitations when applied on complex samples of microbial communities, such as those found in the human intestinal tract. Metagenome studies have indicated that the human intestinal microbiome contains over 100 times more genes than the human genome and it has been estimated that this ecosystem contains over 5000 bacterial species. The genomes of the vast majority of these species have not yet been sequenced and hence their proteomes remain unknown. To enable data analysis of shotgun proteomics data using PSM, and circumvent the lack of a defined matched metaproteome, an iterative workflow was developed that is based on a synthetic metaproteome and the developing metagenomic databases that are both representative for but not necessarily originating from the sample of interest.
Results: Two human fecal samples for which metagenomic data had been collected, were analyzed for their metaproteome using liquid chromatography-mass spectrometry and used to benchmark the developed iterative workflow to other methods. The results show that the developed method is able to detect over 3,000 peptides per fecal sample from the spectral data by circumventing the lack of a defined proteome without naive translation of matched metagenomes and cross-species peptide identification.
Conclusions: The developed iterative workflow achieved an approximate two-fold increase in the amount of identified spectra at a false discovery rate of 1% and can be applied in metaproteomic studies of the human intestinal tract or other complex ecosystems.