De novo transcriptome assembly and annotation for gene discovery in avocado, macadamia and mango

Sci Data. 2020 Jan 8;7(1):9. doi: 10.1038/s41597-019-0350-9.

Abstract

Avocado (Persea americana Mill.), macadamia (Macadamia integrifolia L.) and mango (Mangifera indica L.) are important subtropical tree species grown for their edible fruits and nuts. Despite their commercial and nutritional importance, the genomic information for these species is largely lacking. Here we report the generation of avocado, macadamia and mango transcriptome assemblies from pooled leaf, stem, bud, root, floral and fruit/nut tissue. Using normalized cDNA libraries, we generated comprehensive RNA-Seq datasets from which we assembled 63420, 78871 and 82198 unigenes of avocado, macadamia and mango, respectively using a combination of de novo transcriptome assembly and redundancy reduction. These unigenes were functionally annotated using Basic Local Alignment Search Tool (BLAST) to query the Universal Protein Resource Knowledgebase (UniProtKB). A workflow encompassing RNA extraction, library preparation, transcriptome assembly, redundancy reduction, assembly validation and annotation is provided. This study provides avocado, macadamia and mango transcriptome and annotation data, which is valuable for gene discovery and gene expression profiling experiments as well as ongoing and future genome annotation and marker development applications.

Publication types

  • Dataset
  • Research Support, Non-U.S. Gov't

MeSH terms

  • Gene Library
  • Genes, Plant
  • Macadamia / genetics*
  • Mangifera / genetics*
  • Molecular Sequence Annotation
  • Persea / genetics*
  • RNA-Seq
  • Transcriptome*