Skip to main page content
Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
. 2016 Apr 1;15(4):1222-9.
doi: 10.1021/acs.jproteome.5b01105. Epub 2016 Mar 22.

Cleaning Out the Litterbox of Proteomic Scientists' Favorite Pet: Optimized Data Analysis Avoiding Trypsin Artifacts

Affiliations
Free PMC article

Cleaning Out the Litterbox of Proteomic Scientists' Favorite Pet: Optimized Data Analysis Avoiding Trypsin Artifacts

Matthias Schittmayer et al. J Proteome Res. .
Free PMC article

Abstract

Chemically modified trypsin is a standard reagent in proteomics experiments but is usually not considered in database searches. Modification of trypsin is supposed to protect the protease against autolysis and the resulting loss of activity. Here, we show that modified trypsin is still subject to self-digestion, and, as a result, modified trypsin-derived peptides are present in standard digests. We depict that these peptides commonly lead to false-positive assignments even if native trypsin is considered in the database. Moreover, we present an easily implementable method to include modified trypsin in the database search with a minimal increase in search time and search space while efficiently avoiding these false-positive hits.

Keywords: autolysis protected trypsin; database search; false positives; misassigned spectra; proteomics; search space restriction.

Conflict of interest statement

The authors declare no competing financial interest.

Figures

Figure 2
Figure 2
Log(2) score distribution for separate decoy and target database searches of data set 3 using different search strategies. Left panel: decoy database searches; right panel: target database searches. A: Search strategy A, Swiss-Prot_Human; B: search strategy B, Swiss-Prot_Human plus list of common contaminants; C: search strategy C, Swiss-Prot_Human plus common contaminants and additional variable modification dimethyl (K); D: search strategy D, Swiss-Prot_Human plus common contaminants and additional variable modifications methyl (K) and dimethyl (K). E: search strategy E, Swiss-Prot_Human plus common contaminants and methylated lysines of trypsin considered as artificial amino acids. Marked in black are peptides that pass the FDR 1% cutoff (Mascot ion score of 20.50 for searches A, B, and E, 20.94 for search C, and 22.21 for search D). *** indicates p-value of <10–9, Kruskal–Wallis test; # psm indicates number of peptide spectrum matches.
Figure 1
Figure 1
Methylated lysine residues identified on autolysis peptides of reductively methylated porcine trypsin. Blue: sequence coverage of peptides identified in data set 1 by search C. Peptides are listed in Supporting Table S1. Figures were rendered with PyMOL 0.99rc6 based on PDB entry 1EP. Gray: not identified peptides. Other colors: methylated lysine.
Figure 3
Figure 3
Fractions of false positives explained by different search strategies (data set 3). Search strategy B reveals that 23% of all false positives from search strategy A are caused by common-contaminant-derived peptides (including unmodified trypsin). Search strategy E identifies an additional 7% in this data set (overall range 0–66% and mean 6.6% in all human PRIDE data sets (Supporting Table S4)), which are exclusively caused by methylated trypsin peptides.
Figure 4
Figure 4
Individual spectrum misassigned by two separate search engines despite including contaminants in the database (data set 3). * indicates that the accession number for dimethylated trypsin can be set at discretion, avoiding occupied accession numbers.
Figure 5
Figure 5
Peptide originating from dimethylated trypsin assigned to human trypsin isoform 2 (data set 5). Even though the score for dimethylated trypsin is lower in the Mascot search, the mass difference and the biological origin of the sample implicate that the dimethylated trypsin is the correct assignment. Analysis with MS Amanda showed identical scores and probabilities for both dimethylated porcine trypsin and human trypsin isoform 2.

Similar articles

See all similar articles

Cited by 8 articles

See all "Cited by" articles

References

    1. Perkins D. N.; Pappin D. J.; Creasy D. M.; Cottrell J. S. Probability-based protein identification by searching sequence databases using mass spectrometry data. Electrophoresis 1999, 20 (18), 3551–67. 10.1002/(SICI)1522-2683(19991201)20:18<3551::AID-ELPS3551>3.0.CO;2-2. - DOI - PubMed
    1. Eng J. K.; McCormack A. L.; Yates J. R. An approach to correlate tandem mass spectral data of peptides with amino acid sequences in a protein database. J. Am. Soc. Mass Spectrom. 1994, 5 (11), 976–89. 10.1016/1044-0305(94)80016-2. - DOI - PubMed
    1. Geer L. Y.; Markey S. P.; Kowalak J. A.; Wagner L.; Xu M.; Maynard D. M.; Yang X.; Shi W.; Bryant S. H. Open mass spectrometry search algorithm. J. Proteome Res. 2004, 3 (5), 958–64. 10.1021/pr0499491. - DOI - PubMed
    1. Cox J.; Mann M. MaxQuant enables high peptide identification rates, individualized p.p.b.-range mass accuracies and proteome-wide protein quantification. Nat. Biotechnol. 2008, 26 (12), 1367–72. 10.1038/nbt.1511. - DOI - PubMed
    1. Craig R.; Beavis R. C. TANDEM: matching proteins with tandem mass spectra. Bioinformatics 2004, 20 (9), 1466–7. 10.1093/bioinformatics/bth092. - DOI - PubMed

Publication types

MeSH terms

Feedback