Skip to main page content
Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
, 9 (7), e102155

De Novo Transcriptome Assembly From Inflorescence of Orchis Italica: Analysis of Coding and Non-Coding Transcripts


De Novo Transcriptome Assembly From Inflorescence of Orchis Italica: Analysis of Coding and Non-Coding Transcripts

Sofia De Paolo et al. PLoS One.


The floral transcriptome of Orchis italica, a wild orchid species, was obtained using Illumina RNA-seq technology and specific de novo assembly and analysis tools. More than 100 million raw reads were processed resulting in 132,565 assembled transcripts and 86,079 unigenes with an average length of 606 bp and N50 of 956 bp. Functional annotation assigned 38,984 of the unigenes to records present in the NCBI non-redundant protein database, 32,161 of them to Gene Ontology terms, 15,775 of them to Eukaryotic Orthologous Groups (KOG) and 7,143 of them to Kyoto Encyclopedia of Genes and Genomes (KEGG) pathways. The in silico expression analysis based on the Fragments Per Kilobase of transcript per Million mapped reads (FPKM) was confirmed by real-time RT-PCR experiments on 10 selected unigenes, which showed high and statistically significant positive correlation with the RNA-seq based expression data. The prediction of putative long non-coding RNAs was assessed using two different software packages, CPC and Portrait, resulting in 7,779 unannotated unigenes that matched the threshold values for both of the analyses. Among the predicted long non-coding RNAs, one is the homologue of TAS3, a long non-coding RNA precursor of trans-acting small interfering RNAs (ta-siRNAs). The differential expression pattern observed for the selected putative long non-coding RNAs suggests their possible functional role in different floral tissues.

Conflict of interest statement

Competing Interests: The authors have declared that no competing interests exist.


Figure 1
Figure 1. Inflorescence of O. italica after (A) and before (B) anthesis. (C) Schematic diagram of a single floret of O. italica.
Figure 2
Figure 2. Size distributions of the assembled transcripts (A) and unigenes (B) of the inflorescence of O. italica.
The length ranges are indicated in base pairs.
Figure 3
Figure 3. Size distribution of the annotated transcripts.
(A) Relationship between the sequence length of the assembled unigenes and the percentage of annotations in the NCBI nr protein database. (B) Number of annotated unigenes for each size class. The lengths are indicated in base pairs.
Figure 4
Figure 4. Functional annotations of the unigenes of O. italica.
(A) Level 2 GO term distribution for the biological process, cellular component and molecular function categories. (B) KOG annotation.
Figure 5
Figure 5. Transcription factor annotations of the unigenes of O. italica obtained from the plant TFDB.
Figure 6
Figure 6. Relative expression levels of selected protein coding unigenes of O. italica assessed by real-time PCR analysis of inflorescence tissue (A) and by normalized FPKM counts (B).
Both measures were normalized relative to the actin levels. The bars indicate the standard deviation.
Figure 7
Figure 7. Comparison of the level 2 GO annotations between the reference transcriptome of O. italica (blue) and the 1,144 unigenes with FPKM counts greater than 100 (red).
Asterisks indicate the significantly enriched GO terms among the most expressed unigenes (Fisher exact test p<0.05).
Figure 8
Figure 8. Selected putative long non-coding RNAs expressed in the inflorescence of O. italica.
(A) Agarose gel electrophoresis of the RT-PCR-amplified products of the selected transcripts. Lane 1, comp0_c0_seq1; lane 2, comp3328_c0_seq1; lane 3, comp1231_c0_seq1; lane 4, comp3311_c0_seq1; lane 5, comp48038_c0_seq1; lane 6, comp6669_c0_seq1; lane 7, comp4129_c0_seq1; lane 8, comp1308_c0_seq1; lane 9, comp15481_c0_seq1; lane 10, comp134696_c0_seq1; lane 11, empty; lane 12, 100 bp ladder. (B–G) Relative expression level (Rn) in the outer tepals (Te_out), inner tepals (Te_inn), labellum (Lip), column (Co), ovary (Ov) and leaf (Le) of the transcripts comp0_c0_seq1 (B), comp3328_c0_seq1 (C), comp1231_c0_seq1 (D), comp48038_c0_seq1 (E), comp6669_c0_seq1, (F), and comp134696_c0_seq1 (G). The bars indicate the standard deviation.
Figure 9
Figure 9. Nucleotide sequence alignment of comp134696_c0_seq1 of O. italica and the TAS3 sequences of Hordeum vulgare (accession number BF264964), Zea mays (BE519095), Saccharum hybrid cultivar (CA145655), Sorghum bicolor (CD464142), Oryza sativa (AU100890), and Triticum aestivum (CN010916).
The 5′ and 3′ conserved sequences that are targets of miR-390 are underlined.

Similar articles

See all similar articles

Cited by 15 articles

See all "Cited by" articles


    1. Pridgeon AM (2005) Genera Orchidacearum. Vol. 4, Epidendroideae (part one). Oxford; New York: Oxford University Press. xxii, 672 p., 648 p. of plates p.
    1. Cozzolino S, Widmer A (2005) Orchid diversity: an evolutionary consequence of deception? Trends Ecol Evol 20: 487–494. - PubMed
    1. Tremblay RL, Ackerman JD, Zimmerman JK, Calvo RN (2005) Variation in sexual reproduction in orchids and its evolutionary consequences: a spasmodic journey to diversification. Biol J Linn Soc 84: 1–54.
    1. Aceto S, Gaudio L (2011) The MADS and the beauty: Genes involved in the development of orchid flowers. Curr Genomics 12: 342–356. - PMC - PubMed
    1. Rudall PJ, Bateman RM (2002) Roles of synorganisation, zygomorphy and heterotopy in floral evolution: the gynostemium and labellum of orchids and other lilioid monocots. Biol Rev Camb Philos Soc 77: 403–441. - PubMed

Publication types


Associated data

Grant support

This work was supported by 2009 Regione Campania Grant L.R. N5/2002. The funders had no role in study design, data collection and analysis, decision to publish, or preparation of the manuscript.