The Diatom EST Database

Nucleic Acids Res. 2005 Jan 1;33(Database issue):D344-7. doi: 10.1093/nar/gki121.


The Diatom EST database provides integrated access to expressed sequence tag (EST) data from two eukaryotic microalgae of the class Bacillariophyceae, Phaeodactylum tricornutum and Thalassiosira pseudonana. The database currently contains sequences of close to 30,000 ESTs organized into PtDB, the P.tricornutum EST database, and TpDB, the T.pseudonana EST database. The EST sequences were clustered and assembled into a non-redundant set for each organism, and these non-redundant sequences were then subjected to automated annotation using similarity searches against protein and domain databases. EST sequences, clusters of contiguous sequences, their annotation and analysis with reference to the publicly available databases, and a codon usage table derived from a subset of sequences from PtDB and TpDB can all be accessed in the Diatom EST Database. The underlying RDBMS enables queries over the raw and annotated EST data and retrieval of information through a user-friendly web interface, with options to perform keyword and BLAST searches. The EST data can also be retrieved based on Pfam domains, Cluster of Orthologous Groups (COG) and Gene Ontologies (GO) assigned to them by similarity searches. The Database is available at

Publication types

  • Research Support, Non-U.S. Gov't

MeSH terms

  • DNA, Algal / chemistry
  • Databases, Nucleic Acid*
  • Diatoms / genetics*
  • Expressed Sequence Tags / chemistry*
  • Sequence Analysis, DNA
  • Systems Integration


  • DNA, Algal