De novo assembly of expressed transcripts and global analysis of the Phalaenopsis aphrodite transcriptome

Plant Cell Physiol. 2011 Sep;52(9):1501-14. doi: 10.1093/pcp/pcr097. Epub 2011 Jul 19.


Being one of the largest families in the angiosperms, Orchidaceae display a great biodiversity resulting from adaptation to diverse habitats. Genomic information on orchids is rather limited, despite their unique and interesting biological features, thus impeding advanced molecular research. Here we report a strategy to integrate sequence outputs of the moth orchid, Phalaenopsis aphrodite, from two high-throughput sequencing platform technologies, Roche 454 and Illumina/Solexa, in order to maximize assembly efficiency. Tissues collected for cDNA library preparation included a wide range of vegetative and reproductive tissues. We also designed an effective workflow for annotation and functional analysis. After assembly and trimming processes, 233,823 unique sequences were obtained. Among them, 42,590 contigs averaging 875 bp in length were annotated to protein-coding genes, of which 7,263 coding genes were found to be nearly full length. The sequence accuracy of the assembled contigs was validated to be as high as 99.9%. Genes with tissue-specific expression were also categorized by profiling analysis with RNA-Seq. Gene products targeted to specific subcellular localizations were identified by their annotations. We concluded that, with proper assembly to combine outputs of next-generation sequencing platforms, transcriptome information can be enriched in gene discovery, functional annotation and expression profiling of a non-model organism.

Publication types

  • Research Support, Non-U.S. Gov't

MeSH terms

  • Contig Mapping
  • DNA, Plant / genetics
  • Databases, Genetic
  • Gene Expression Profiling / methods*
  • Gene Library
  • Molecular Sequence Annotation
  • Orchidaceae / genetics*
  • Sequence Analysis, DNA / methods
  • Transcriptome*


  • DNA, Plant