Phylogenomic analysis of EST datasets

Methods Mol Biol. 2009;533:257-76. doi: 10.1007/978-1-60327-136-3_12.


To date the genomes of over 600 organisms have been generated of which 100 are from eukaryotes. Together with partial genome data for an additional 700 eukaryotic organisms, these exceptional sequence resources offer new opportunities to explore phylogenetic relationships and species diversity. The identification of highly diverse sequences specific to an EST-based sequence dataset offers insights into the extent of genetic novelty within that dataset. Sequences that are only shared with other related species from the same taxon might represent genes associated with taxon-specific innovations. On the other hand, sequences that are highly conserved across many other species offer valuable resources for performing more in-depth phylogenetic analyses. In the following chapter, we guide the reader through the process of examining their sequence datasets in the context of phylogenetic relationships. Performed across large-scale datasets, such analyses are termed Phylogenomics. Two complementary approaches are described, both based on the use of BLAST similarity metrics. The first uses an established Java tool - SimiTri - to visualize sequence similarity relationships between the EST dataset and three user-defined datasets. The second focuses on the use of phylogenetic profiles to identify groups of taxonomically related sequences.

Publication types

  • Research Support, Non-U.S. Gov't

MeSH terms

  • Animals
  • Cluster Analysis
  • Computational Biology / methods*
  • Computers
  • Databases, Genetic
  • Expressed Sequence Tags*
  • Genomics*
  • Humans
  • Phylogeny
  • Programming Languages
  • Software
  • User-Computer Interface