Transcriptome de novo assembly sequencing and analysis of the toxic dinoflagellate Alexandrium catenella using the Illumina platform

Gene. 2014 Mar 10;537(2):285-93. doi: 10.1016/j.gene.2013.12.041. Epub 2014 Jan 15.


In this article, high-throughput de novo transcriptomic sequencing was performed in Alexandrium catenella, which provided the first view of the gene repertoire in this dinoflagellate based on next-generation sequencing (NGS) technologies. A total of 118,304 unigenes were identified with an average length of 673bp (base pair). Of these unigenes, 77,936 (65.9%) were annotated with known proteins based on sequence similarities, among which 24,149 and 22,956 unigenes were assigned to gene ontology categories (GO) and clusters of orthologous groups (COGs), respectively. Furthermore, 16,467 unigenes were mapped onto 322 pathways using the Kyoto Encyclopedia of Genes and Genomes Pathway database (KEGG). We also detected 1143 simple sequence repeats (SSRs), in which the tri-nucleotide repeat motif (69.3%) was the most abundant. The genetic facts and significance derived from the transcriptome dataset were suggested and discussed. All four core nucleosomal histones and linker histones were detected, in addition to the unigenes involved in histone modifications.190 unigenes were identified as being involved in the endocytosis pathway, and clathrin-dependent endocytosis was suggested to play a role in the heterotrophy of A. catenella. A conserved 22-nt spliced leader (SL) was identified in 21 unigenes which suggested the existence of trans-splicing processing of mRNA in A. catenella.

Keywords: Alexandrium catenella; BLAST-Like Alignment Tool; BLAT; COGs; EDTA; EST; Endocytosis; Ethylene Diamine Tetraacetic Acid; Expressed Sequence Tag; GO; HCCs; HSP; High-throughput sequencing; KEGG; Kyoto Encyclopedia of Genes and Genomes; ML; Mn; NCBI; NGS; National Center for Biotechnology Information; Nr; Nt; Nucleosomal histones; P; PSP; Paralytic Shellfish Poisoning; RPKM; Red tide; SBS; SDS; SL; SSR; UTR; base pair; bp; clusters of orthologous groups; gene ontology; heat shock protein; histone-like protein sequences; manganese; maximum likelihood; next-generation sequencing; non-redundant nucleotide sequences; non-redundant protein sequences; phosphorus; reads per kilobase per million sequenced reads; sequencing by synthesis; simple sequence repeat; sodium dodecyl sulfate; spliced leader; untranslated regions.

Publication types

  • Research Support, Non-U.S. Gov't

MeSH terms

  • Base Sequence
  • Dinoflagellida / genetics*
  • Endocytosis / genetics*
  • High-Throughput Nucleotide Sequencing / methods*
  • Histones / genetics
  • Microsatellite Repeats
  • Molecular Sequence Annotation
  • Molecular Sequence Data
  • Nucleotide Motifs
  • Phylogeny
  • Sequence Analysis, DNA
  • Trans-Splicing
  • Transcriptome*


  • Histones