EDITtoTrEMBL: a distributed approach to high-quality automated protein sequence annotation

Bioinformatics. 1999 Mar;15(3):219-27. doi: 10.1093/bioinformatics/15.3.219.

Abstract

Summary: Many databases in molecular biology face the problem that the ever increasing rate of data production can no longer be handled by traditional methods, especially human curation. Therefore, a number of projects are currently investigating methods for automated sequence annotation. This paper describes the EBI's approach to this problem for protein sequences by integration of arbitrary analysis programs into a distributed and highly flexible environment. Our software framework allows an individual treatment of sequences depending on their particular properties, which is achieved through a high-level description of the preconditions and capabilities of analysing modules. This not only improves the overall performance of the annotation process, as unnecessary steps are avoided, but also enhances its quality since dependencies between different modules are taken into account. We have implemented a prototype and use it in the production of TrEMBL releases.

Availability: Upon request.

Publication types

  • Comparative Study
  • Research Support, Non-U.S. Gov't

MeSH terms

  • Algorithms
  • Amino Acid Sequence
  • Databases, Factual*
  • Humans
  • Molecular Sequence Data
  • Proteins / genetics
  • Sequence Analysis / methods*
  • Sequence Analysis / statistics & numerical data
  • Software*

Substances

  • Proteins