Whole genome sequencing data and de novo draft assemblies for 66 teleost species

Sci Data. 2017 Jan 17;4:160132. doi: 10.1038/sdata.2016.132.


Teleost fishes comprise more than half of all vertebrate species, yet genomic data are only available for 0.2% of their diversity. Here, we present whole genome sequencing data for 66 new species of teleosts, vastly expanding the availability of genomic data for this important vertebrate group. We report on de novo assemblies based on low-coverage (9-39×) sequencing and present detailed methodology for all analyses. To facilitate further utilization of this data set, we present statistical analyses of the gene space completeness and verify the expected phylogenetic position of the sequenced genomes in a large mitogenomic context. We further present a nuclear marker set used for phylogenetic inference and evaluate each gene tree in relation to the species tree to test for homogeneity in the phylogenetic signal. Collectively, these analyses illustrate the robustness of this highly diverse data set and enable extensive reuse of the selected phylogenetic markers and the genomic data in general. This data set covers all major teleost lineages and provides unprecedented opportunities for comparative studies of teleosts.

Publication types

  • Dataset
  • Research Support, Non-U.S. Gov't

MeSH terms

  • Animals
  • Fishes*
  • Genome
  • Genomics
  • Phylogeny
  • Whole Genome Sequencing*