De novo transcriptome assembly and novel microsatellite marker information in Capsicum annuum varieties Saengryeg 211 and Saengryeg 213

Bot Stud. 2013 Dec;54(1):58. doi: 10.1186/1999-3110-54-58. Epub 2013 Nov 21.

Abstract

Background: Pepper, Capsicum annuum L., Solanaceae, is a major staple economically important vegetable crop worldwide. Limited functional genomics resources and whole genome association studies could be substantially improved through the application of molecular approach for the characterization of gene content and identification of molecular markers. The massive parallel pyrosequencing of two pepper varieties, the highly pungent, Saengryeg 211, and the non-pungent, Saengryeg 213, including de novo transcriptome assembly, functional annotation, and in silico discovery of potential molecular markers is described. We performed 454 GS-FLX Titanium sequencing of polyA-selected and normalized cDNA libraries generated from a single pool of transcripts obtained from mature fruits of two pepper varieties.

Results: A single 454 pyrosequencing run generated 361,671 and 274,269 reads totaling 164.49 and 124.60 Mb of sequence data (average read length of 454 nucleotides), which assembled into 23,821 and 17,813 isotigs and 18,147 and 15,129 singletons for both varieties, respectively. These reads were organized into 20,352 and 15,781 'isogroups' for both varieties. Assembled sequences were functionally annotated based on homology to genes in multiple public databases and assigned with Gene Ontology (GO) terms. Sequence variants analyses identified a total of 3,766 and 2,431 potential (Simple Sequence Repeat) SSR motifs for microsatellite analysis for both varieties, where trinucleotide was the most common repeat unit (84%), followed by di (9.9%), hexa (4.1%) and pentanucleotide repeats (2.1%). GAA repeat (8.6%) was the most frequent repeat motif, followed by TGG (7.2%), TTC (6.5%), and CAG (6.2%).

Conclusions: High-throughput transcriptome assembly, annotation and large scale of SSR marker discovery has been achieved using next generation sequencing (NGS) of two pepper varieties. These valuable informations for functional genomics resource shall help to further improve the pepper breeding efforts with respect to genetic linkage maps, QTL mapping and marker-assisted trait selection.

Keywords: Capsicum annuum; Molecular markers; Next generation sequencing; Simple sequence repeats; Transcriptome profiling.