Virtual polymorphism: finding divergent peptide matches in mass spectrometry data

Anal Chem. 2007 Jul 1;79(13):5030-9. doi: 10.1021/ac0703496. Epub 2007 May 24.


The prevailing method of analyzing tandem-MS data for protein identification involves the comparison of peptide molecular weight and fragmentation data to theoretically predicted values, based on known protein sequences in databases. This is generally effective since proteins from most species under study are in the database or have sufficient homology to allow significant matching. We have encountered difficulties identifying proteins from fungal species Alternaria alternata due to significant interspecies protein sequence differences (divergence) and its absence from the database. This common household mold causes asthma and allergy problems, but the genome has not been sequenced. De novo sequencing and error-tolerant methods can facilitate protein identifications in divergent, unsequenced species. But these standard methods can be laborious and only allow single amino acid substitution, respectively. We have developed an alternative approach focusing on database engineering, predicting biologically rational polymorphism using statistically weighted amino acid substitution information held in BLOSUM62. Like other second pass methods, it is based on the initially identified protein. However, this approach allows more control over sequences to be considered, including multiple changes per peptide. The results show considerable improvement for routine protein identification and the potential for rescuing otherwise unconvincing identifications in unusually divergent species.

MeSH terms

  • Amino Acid Sequence
  • Animals
  • Databases, Protein*
  • Electrophoresis, Polyacrylamide Gel / methods
  • Genome
  • Humans
  • Isoelectric Focusing / methods
  • Mass Spectrometry / methods*
  • Molecular Sequence Data
  • Molecular Weight
  • Peptides / analysis*
  • Peptides / chemistry
  • Polymorphism, Genetic*
  • Proteins / analysis*
  • Proteins / chemistry
  • Sequence Homology, Amino Acid


  • Peptides
  • Proteins