A two-step database search method improves sensitivity in peptide sequence matches for metaproteomics and proteogenomics studies

Pratik Jagtap; Jill Goslinga; Joel A Kooren; Thomas McGowan; Matthew S Wroblewski; Sean L Seymour; Timothy J Griffin

doi:10.1002/pmic.201200352

A two-step database search method improves sensitivity in peptide sequence matches for metaproteomics and proteogenomics studies

Proteomics. 2013 Apr;13(8):1352-7. doi: 10.1002/pmic.201200352. Epub 2013 Mar 15.

Authors

Pratik Jagtap¹, Jill Goslinga, Joel A Kooren, Thomas McGowan, Matthew S Wroblewski, Sean L Seymour, Timothy J Griffin

Affiliation

¹ Minnesota Supercomputing Institute, Minneapolis, MN, USA. pratik@msi.umn.edu

Abstract

Large databases (>10(6) sequences) used in metaproteomic and proteogenomic studies present challenges in matching peptide sequences to MS/MS data using database-search programs. Most notably, strict filtering to avoid false-positive matches leads to more false negatives, thus constraining the number of peptide matches. To address this challenge, we developed a two-step method wherein matches derived from a primary search against a large database were used to create a smaller subset database. The second search was performed against a target-decoy version of this subset database merged with a host database. High confidence peptide sequence matches were then used to infer protein identities. Applying our two-step method for both metaproteomic and proteogenomic analysis resulted in twice the number of high confidence peptide sequence matches in each case, as compared to the conventional one-step method. The two-step method captured almost all of the same peptides matched by the one-step method, with a majority of the additional matches being false negatives from the one-step method. Furthermore, the two-step method improved results regardless of the database search program used. Our results show that our two-step method maximizes the peptide matching sensitivity for applications requiring large databases, especially valuable for proteogenomics and metaproteomics studies.

Publication types

Research Support, N.I.H., Extramural
Research Support, Non-U.S. Gov't
Research Support, U.S. Gov't, Non-P.H.S.

MeSH terms

Algorithms
Amino Acid Sequence*
Databases, Protein*
Expressed Sequence Tags
Genomics / methods
Humans
Metagenome
Mouth Mucosa / metabolism
Peptides / chemistry*
Proteomics / methods*
Saliva / metabolism
Search Engine*
Sensitivity and Specificity
Software
Tandem Mass Spectrometry / methods

Substances

Peptides

Abstract

Publication types

MeSH terms

Substances

Grants and funding