Removing redundancy in SWISS-PROT and TrEMBL

Bioinformatics. 1999 Mar;15(3):258-9. doi: 10.1093/bioinformatics/15.3.258.


Summary: One of the distinguishing criteria of the SWISS-PROT protein sequence data bank is minimal redundancy. The introduction of TrEMBL as a supplementary database ensured the comprehensiveness of SWISS-PROT and TrEMBL but introduced some degree of redundancy. We developed a strategy to identify the redundancy present within and between SWISS-PROT and TrEMBL and its subsequent removal.

Availability: The tools mentioned in this paper are available on request.

Publication types

  • Research Support, Non-U.S. Gov't

MeSH terms

  • Algorithms
  • Databases, Factual*
  • Genetic Variation
  • Mutation
  • Polymorphism, Genetic
  • Proteins / genetics*
  • Sequence Analysis / methods
  • Sequence Analysis / statistics & numerical data
  • Software


  • Proteins