Comparisons of de novo transcriptome assemblers in diploid and polyploid species using peanut (Arachis spp.) RNA-Seq data

PLoS One. 2014 Dec 31;9(12):e115055. doi: 10.1371/journal.pone.0115055. eCollection 2014.


The narrow genetic base and limited genetic information on Arachis species have hindered the process of marker-assisted selection of peanut cultivars. However, recent developments in sequencing technologies have expanded opportunities to exploit genetic resources, and at lower cost. To use the genetic information for Arachis species available at the transcriptome level, it is important to have a good quality reference transcriptome. The available Tifrunner 454 FLEX transcriptome sequences have an assembly with 37,000 contigs and low N50 values of 500-751 bp. Therefore, we generated de novo transcriptome assemblies, with about 38 million reads in the tetraploid cultivar OLin, and 16 million reads in each of the diploids, A. duranensis K38901 and A. ipaënsis KGBSPSc30076 using three different de novo assemblers, Trinity, SOAPdenovo-Trans and TransAByss. All these assemblers can use single kmer analysis, and the latter two also permit multiple kmer analysis. Assemblies generated for all three samples had N50 values ranging from 1278-1641 bp in Arachis hypogaea (AABB), 1401-1492 bp in Arachis duranensis (AA), and 1107-1342 bp in Arachis ipaënsis (BB). Comparison with legume ESTs and protein databases suggests that assemblies generated had more than 40% full length transcripts with good continuity. Also, on mapping the raw reads to each of the assemblies generated, Trinity had a high success rate in assembling sequences compared to both TransAByss and SOAPdenovo-Trans. De novo assembly of OLin had a greater number of contigs (67,098) and longer contig length (N50 = 1,641) compared to the Tifrunner TSA. Despite having shorter read length (2 × 50) than the Tifrunner 454FLEX TSA, de novo assembly of OLin proved superior in comparison. Assemblies generated to represent different genome combinations may serve as a valuable resource for the peanut research community.

Publication types

  • Comparative Study
  • Research Support, Non-U.S. Gov't

MeSH terms

  • Arachis / genetics*
  • Diploidy*
  • Gene Expression Profiling / methods*
  • Polyploidy*
  • RNA, Messenger / genetics
  • RNA, Messenger / metabolism
  • Sequence Analysis, RNA*


  • RNA, Messenger

Associated data

  • SRA/PRJNA248910
  • figshare/10.6084/M9.FIGSHARE.1236527

Grant support

This work was supported by awards from the Texas Peanut Producers Board ( award CY2008-Burow-TTU-Development to MDB and CES, and 2009-TTU-Burow-Genotyping to MDB, National Peanut Board ( grant #332/TX-99/1139 to MDB, and #332/TX-99/1213 to MDB and CES, Peanut Foundation ( grant 04-810-08 to MDB, Ogallala Aquifer Initiative ( award IPM12.06 to MDB, and United States Department of Agriculture/National Institute of Food and Agriculture Hatch Act ( award TEX08835 to MDB funding. The funders had no role in study design, data collection and analysis, decision to publish, or preparation of the manuscript.