A bioinformatics workflow for variant peptide detection in shotgun proteomics

Mol Cell Proteomics. 2011 May;10(5):M110.006536. doi: 10.1074/mcp.M110.006536. Epub 2011 Mar 9.


Shotgun proteomics data analysis usually relies on database search. However, commonly used protein sequence databases do not contain information on protein variants and thus prevent variant peptides and proteins from been identified. Including known coding variations into protein sequence databases could help alleviate this problem. Based on our recently published human Cancer Proteome Variation Database, we have created a protein sequence database that comprehensively annotates thousands of cancer-related coding variants collected in the Cancer Proteome Variation Database as well as noncancer-specific ones from the Single Nucleotide Polymorphism Database (dbSNP). Using this database, we then developed a data analysis workflow for variant peptide identification in shotgun proteomics. The high risk of false positive variant identifications was addressed by a modified false discovery rate estimation method. Analysis of colorectal cancer cell lines SW480, RKO, and HCT-116 revealed a total of 81 peptides that contain either noncancer-specific or cancer-related variations. Twenty-three out of 26 variants randomly selected from the 81 were confirmed by genomic sequencing. We further applied the workflow on data sets from three individual colorectal tumor specimens. A total of 204 distinct variant peptides were detected, and five carried known cancer-related mutations. Each individual showed a specific pattern of cancer-related mutations, suggesting potential use of this type of information for personalized medicine. Compatibility of the workflow has been tested with four popular database search engines including Sequest, Mascot, X!Tandem, and MyriMatch. In summary, we have developed a workflow that effectively uses existing genomic data to enable variant peptide detection in proteomics.

Publication types

  • Research Support, N.I.H., Extramural
  • Research Support, Non-U.S. Gov't

MeSH terms

  • Algorithms
  • Amino Acid Sequence
  • Base Sequence
  • Carcinoma / genetics
  • Carcinoma / metabolism
  • Cell Line, Tumor
  • Colorectal Neoplasms / genetics
  • Colorectal Neoplasms / metabolism*
  • Computational Biology*
  • Databases, Protein
  • Genes, ras
  • Humans
  • Mutant Proteins / analysis*
  • Mutation
  • Neoplasm Proteins / genetics
  • Neoplasm Proteins / metabolism
  • Proteome / analysis
  • Proteome / genetics
  • Proteomics / methods*
  • Sigmoid Neoplasms / genetics
  • Sigmoid Neoplasms / metabolism
  • Workflow


  • Mutant Proteins
  • Neoplasm Proteins
  • Proteome