Skip to main page content
Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
Comparative Study
, 12 (10), R102

The Draft Genome and Transcriptome of Cannabis Sativa

Comparative Study

The Draft Genome and Transcriptome of Cannabis Sativa

Harm van Bakel et al. Genome Biol.


Background: Cannabis sativa has been cultivated throughout human history as a source of fiber, oil and food, and for its medicinal and intoxicating properties. Selective breeding has produced cannabis plants for specific uses, including high-potency marijuana strains and hemp cultivars for fiber and seed production. The molecular biology underlying cannabinoid biosynthesis and other traits of interest is largely unexplored.

Results: We sequenced genomic DNA and RNA from the marijuana strain Purple Kush using shortread approaches. We report a draft haploid genome sequence of 534 Mb and a transcriptome of 30,000 genes. Comparison of the transcriptome of Purple Kush with that of the hemp cultivar 'Finola' revealed that many genes encoding proteins involved in cannabinoid and precursor pathways are more highly expressed in Purple Kush than in 'Finola'. The exclusive occurrence of Δ9-tetrahydrocannabinolic acid synthase in the Purple Kush transcriptome, and its replacement by cannabidiolic acid synthase in 'Finola', may explain why the psychoactive cannabinoid Δ9-tetrahydrocannabinol (THC) is produced in marijuana but not in hemp. Resequencing the hemp cultivars 'Finola' and 'USO-31' showed little difference in gene copy numbers of cannabinoid pathway enzymes. However, single nucleotide variant analysis uncovered a relatively high level of variation among four cannabis types, and supported a separation of marijuana and hemp.

Conclusions: The availability of the Cannabis sativa genome enables the study of a multifunctional plant that occupies a unique role in human culture. Its availability will aid the development of therapeutic marijuana strains with tailored cannabinoid profiles and provide a basis for the breeding of hemp with improved agronomic characteristics.


Figure 1
Figure 1
Transcript classes in Cannabis sativa and Arabidopsis thaliana. Panther [28] was used to determine the distribution of transcripts in (a) C. sativa (PK) (30,074 representative transcripts) and (b) A. thaliana (31,684 transcripts). The high degree of similarity between both species indicates that all major functional classes are proportionally represented in the PK transcriptome assembly.
Figure 2
Figure 2
Proportion of transcriptome mapping to genome assembly. (a) A histogram showing the number of bases in the transcript assembly that could be mapped to the genome at 98% sequence identity, as a function of transcript length in 300 nt bins. (b) The proportion of transcriptome bases that could be mapped to the genome for the same bins as in (a). The black dashed line indicates the proportion of the transcriptome that is accounted for in the genome assembly.
Figure 3
Figure 3
Analysis of gene expression in PK tissues. (a) RNA-Seq read counts for 30,074 representative transcripts (rows), expressed as log2 RPKM, were subjected to hierarchical agglomerative clustering based on their expression pattern across tissues (columns). (b) Schematic illustration of THCA and CBDA cannabinoid biosynthesis, including the production of fatty acid and isoprenoid precursors via the hexanoate, MEP and GPP pathways. Hexanoate could arise through fatty acid degradation, involving desaturase, lipoxygenase (LOX) and hydroperoxide lyase (HPL) steps. Activation of hexanoate by an acyl-activating enzyme (AAE) yields hexanoyl-CoA, which is the substrate for the polyketide synthase enzyme (OLS) that forms olivetolic acid. The prenyl side-chain originates in the MEP pathway, which provides substrates for GPP synthesis, and is added by an aromatic prenyltransferase (PT) [36]. The final steps are catalyzed by the oxidocyclases THCAS and CBDAS. Pathway enzymes and metabolic intermediates are indicated in black and blue, respectively. (c) Same data as (a), showing the expression levels for genes in the cannabinoid pathway and precursor pathways (rows) across the six assayed tissues (columns). The majority of the genes encoding cannabinoid and precursor pathway enzymes are most highly expressed in the flowering stages. Gene and pathway names correspond to those used in panel B.
Figure 4
Figure 4
Comparison of gene expression in female cannabis flowers, and gene copy number, between marijuana (PK) and hemp ('Finola'). (a) A scatter plot of RNA-Seq read counts for all representative transcripts in marijuana and hemp, expressed as log2 RPKM. Specific subsets of transcripts are shown in color, as indicated in the key. The dashed line represents the relative enrichment of trichomes in the marijuana strain, inferred from the ratio in expression of trichome-specific genes, as defined in the text. Gene symbols/abbreviations: CAN - known and putative cannabinoid pathway genes; HEX - putative hexanoate pathway genes; GPP - GPP pathway genes; MEP - MEP pathway genes; TF - putative transcription factors according to PFAM, with at least a 4-fold change in expression in PK relative to 'Finola'; MYB - Myb-domain transcription factors previously suggested as trichome regulators. (b) A scatter plot of the log2 median read depth (MRD) of genomic DNA-Seq reads that aligned uniquely to the PK transcriptome. Genomic reads were trimmed to a length of 32 bases prior to alignment with Bowtie, to allow for mapping close to exon junctions. The lack of outliers in the scatter plot indicates that there have been relatively few changes in gene copy number between marijuana and hemp. (c) The relative RNA-Seq expression of individual genes in the cannabinoid pathway and precursor pathways (is shown on the left), adjusted for enrichment of trichome-specific genes (i.e. relative to the dashed line in panel a). The median genomic DNA read depth for the same genes is shown on the right. The box plots reflect the variation in the depth of coverage of uniquely aligned genomic DNA reads across each transcript, with the median coverage distribution across all transcripts shown as reference (All). Reads that are likely derived from pseudogenes are marked by the symbol [P]. While there is increased expression of most cannabinoid genes in the HEX and CAN pathways (left) in PK, this does not appear to be due to an increased representation of these genes in the PK genome relative to the 'Finola' genome (right).
Figure 5
Figure 5
Neighbour-joining tree for two hemp cultivars and two marijuana strains. The tree was plotted in MEGA5 [71] using the maximum composite likelihood of SNV nucleotide substitution rates, calculated based on the concatenated SNV sequences in each variety, as a distance metric. The topology of the tree reveals a distinct separation between the hemp and marijuana strains.

Comment in

Similar articles

  • Differentiation of Cannabis Subspecies by THCA Synthase Gene Analysis Using RFLP
    N Cirovic et al. J Forensic Leg Med 51, 81-84. PMID 28772109.
    Cannabis sativa subspecies, known as industrial hemp (C. sativa sativa) and marijuana (C. sativa indica) show no evident morphological distinctions, but they contain diff …
  • Gene Duplication and Divergence Affecting Drug Content in Cannabis Sativa
    GD Weiblen et al. New Phytol 208 (4), 1241-50. PMID 26189495.
    Cannabis sativa is an economically important source of durable fibers, nutritious seeds, and psychoactive drugs but few economic plants are so poorly understood genetical …
  • The Genetic Structure of Marijuana and Hemp
    J Sawler et al. PLoS One 10 (8), e0133292. PMID 26308334.
    Despite its cultivation as a source of food, fibre and medicine, and its global status as the most used illicit drug, the genus Cannabis has an inconclusive taxonomic org …
  • The Complex Interactions Between Flowering Behavior and Fiber Quality in Hemp
    EMJ Salentijn et al. Front Plant Sci 10, 614. PMID 31156677. - Review
    Hemp, Cannabis sativa L., is a sustainable multipurpose fiber crop with high nutrient and water use efficiency and with biomass of excellent quality for textile fi …
  • Marijuana Poisoning
    KT Fitzgerald et al. Top Companion Anim Med 28 (1), 8-12. PMID 23796481. - Review
    The plant Cannabis sativa has been used for centuries for the effects of its psychoactive resins. The term "marijuana" typically refers to tobacco-like preparations of th …
See all similar articles

Cited by 93 PubMed Central articles

See all "Cited by" articles


    1. Schultes RE, Klein WM, Plowman T, Lockwood TE. Cannabis: an example of taxonomic neglect. Bot Mus Leafl Harvard Univ. 1974;23:337–367.
    1. Li HL. An archaeological and historical account of cannabis in China. Econ Bot. 1973;28:437–444. doi: 10.1007/BF02862859. - DOI
    1. Russo EB, Jiang H-E, Li X, Sutton A, Carboni A, Bianco F del, Mandolino G, Potter DJ, Zhao Y-X, Bera S, Zhang Y-B, Lü E-G, Ferguson DK, Hueber F, Zhao L-C, Liu C-J, Wang Y-F, Li C-S. Phytochemical and genetic analyses of ancient cannabis from Central Asia. J Exp Bot. 2008;59:4171–4182. doi: 10.1093/jxb/ern260. - DOI - PMC - PubMed
    1. Zias J, Stark H, Sellgman J, Levy R, Werker E, Breuer A, Mechoulam R. Early medical use of cannabis. Nature. 1993;363:215. - PubMed
    1. UNODC. World Drug Report 2011. United Nations Publication, Sales No. E.11.XI.10;

Publication types

Associated data