Multi-Omics Driven Assembly and Annotation of the Sandalwood (Santalum album) Genome

Plant Physiol. 2018 Apr;176(4):2772-2788. doi: 10.1104/pp.17.01764. Epub 2018 Feb 12.

Abstract

Indian sandalwood (Santalum album) is an important tropical evergreen tree known for its fragrant heartwood-derived essential oil and its valuable carving wood. Here, we applied an integrated genomic, transcriptomic, and proteomic approach to assemble and annotate the Indian sandalwood genome. Our genome sequencing resulted in the establishment of a draft map of the smallest genome for any woody tree species to date (221 Mb). The genome annotation predicted 38,119 protein-coding genes and 27.42% repetitive DNA elements. In-depth proteome analysis revealed the identities of 72,325 unique peptides, which confirmed 10,076 of the predicted genes. The addition of transcriptomic and proteogenomic approaches resulted in the identification of 53 novel proteins and 34 gene-correction events that were missed by genomic approaches. Proteogenomic analysis also helped in reassigning 1,348 potential noncoding RNAs as bona fide protein-coding messenger RNAs. Gene expression patterns at the RNA and protein levels indicated that peptide sequencing was useful in capturing proteins encoded by nuclear and organellar genomes alike. Mass spectrometry-based proteomic evidence provided an unbiased approach toward the identification of proteins encoded by organellar genomes. Such proteins are often missed in transcriptome data sets due to the enrichment of only messenger RNAs that contain poly(A) tails. Overall, the use of integrated omic approaches enhanced the quality of the assembly and annotation of this nonmodel plant genome. The availability of genomic, transcriptomic, and proteomic data will enhance genomics-assisted breeding, germplasm characterization, and conservation of sandalwood trees.

Publication types

  • Research Support, Non-U.S. Gov't

MeSH terms

  • Gene Expression Profiling / methods*
  • Gene Expression Regulation, Plant
  • Genome, Plant / genetics*
  • Genomics / methods
  • High-Throughput Nucleotide Sequencing / methods
  • Molecular Sequence Annotation / methods*
  • Phylogeny
  • Plant Proteins / classification
  • Plant Proteins / genetics
  • Proteome / genetics
  • Proteome / metabolism
  • Proteomics / methods*
  • Santalum / genetics*

Substances

  • Plant Proteins
  • Proteome