Assembly of the 373k gene space of the polyploid sugarcane genome reveals reservoirs of functional diversity in the world's leading biomass crop

Gigascience. 2019 Dec 1;8(12):giz129. doi: 10.1093/gigascience/giz129.


Background: Sugarcane cultivars are polyploid interspecific hybrids of giant genomes, typically with 10-13 sets of chromosomes from 2 Saccharum species. The ploidy, hybridity, and size of the genome, estimated to have >10 Gb, pose a challenge for sequencing.

Results: Here we present a gene space assembly of SP80-3280, including 373,869 putative genes and their potential regulatory regions. The alignment of single-copy genes in diploid grasses to the putative genes indicates that we could resolve 2-6 (up to 15) putative homo(eo)logs that are 99.1% identical within their coding sequences. Dissimilarities increase in their regulatory regions, and gene promoter analysis shows differences in regulatory elements within gene families that are expressed in a species-specific manner. We exemplify these differences for sucrose synthase (SuSy) and phenylalanine ammonia-lyase (PAL), 2 gene families central to carbon partitioning. SP80-3280 has particular regulatory elements involved in sucrose synthesis not found in the ancestor Saccharum spontaneum. PAL regulatory elements are found in co-expressed genes related to fiber synthesis within gene networks defined during plant growth and maturation. Comparison with sorghum reveals predominantly bi-allelic variations in sugarcane, consistent with the formation of 2 "subgenomes" after their divergence ∼3.8-4.6 million years ago and reveals single-nucleotide variants that may underlie their differences.

Conclusions: This assembly represents a large step towards a whole-genome assembly of a commercial sugarcane cultivar. It includes a rich diversity of genes and homo(eo)logous resolution for a representative fraction of the gene space, relevant to improve biomass and food production.

Keywords: allele; bioenergy; biomass; genome; polyploid.

Publication types

  • Research Support, N.I.H., Extramural
  • Research Support, Non-U.S. Gov't
  • Research Support, U.S. Gov't, Non-P.H.S.

MeSH terms

  • Biomass
  • Contig Mapping / methods*
  • Crops, Agricultural / genetics
  • Crops, Agricultural / growth & development
  • Genetic Variation
  • Genome Size
  • Genome, Plant
  • Glucosyltransferases / genetics*
  • Multigene Family
  • Phenylalanine Ammonia-Lyase / genetics*
  • Plant Proteins / genetics
  • Polyploidy
  • Promoter Regions, Genetic
  • Saccharum / genetics
  • Saccharum / growth & development*


  • Plant Proteins
  • Glucosyltransferases
  • sucrose synthase
  • Phenylalanine Ammonia-Lyase