SPIDER: software for protein identification from sequence tags with de novo sequencing error

J Bioinform Comput Biol. 2005 Jun;3(3):697-716. doi: 10.1142/s0219720005001247.

Abstract

For the identification of novel proteins using MS/MS, de novo sequencing software computes one or several possible amino acid sequences (called sequence tags) for each MS/MS spectrum. Those tags are then used to match, accounting amino acid mutations, the sequences in a protein database. If the de novo sequencing gives correct tags, the homologs of the proteins can be identified by this approach and software such as MS-BLAST is available for the matching. However, de novo sequencing very often gives only partially correct tags. The most common error is that a segment of amino acids is replaced by another segment with approximately the same masses. We developed a new efficient algorithm to match sequence tags with errors to database sequences for the purpose of protein and peptide identification. A software package, SPIDER, was developed and made available on Internet for free public use. This paper describes the algorithms and features of the SPIDER software.

Publication types

  • Evaluation Study
  • Research Support, Non-U.S. Gov't

MeSH terms

  • Algorithms*
  • Expressed Sequence Tags*
  • Mass Spectrometry / methods
  • Peptide Mapping / methods*
  • Proteins / analysis
  • Proteins / chemistry*
  • Proteins / classification
  • Proteins / genetics
  • Sequence Alignment / methods*
  • Sequence Analysis, Protein / methods*
  • Software*

Substances

  • Proteins