Skip to main page content
Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
Comparative Study
, 8 (10), e76925
eCollection

Application of Genotyping-By-Sequencing on Semiconductor Sequencing Platforms: A Comparison of Genetic and Reference-Based Marker Ordering in Barley

Affiliations
Comparative Study

Application of Genotyping-By-Sequencing on Semiconductor Sequencing Platforms: A Comparison of Genetic and Reference-Based Marker Ordering in Barley

Martin Mascher et al. PLoS One.

Abstract

The rapid development of next-generation sequencing platforms has enabled the use of sequencing for routine genotyping across a range of genetics studies and breeding applications. Genotyping-by-sequencing (GBS), a low-cost, reduced representation sequencing method, is becoming a common approach for whole-genome marker profiling in many species. With quickly developing sequencing technologies, adapting current GBS methodologies to new platforms will leverage these advancements for future studies. To test new semiconductor sequencing platforms for GBS, we genotyped a barley recombinant inbred line (RIL) population. Based on a previous GBS approach, we designed bar code and adapter sets for the Ion Torrent platforms. Four sets of 24-plex libraries were constructed consisting of 94 RILs and the two parents and sequenced on two Ion platforms. In parallel, a 96-plex library of the same RILs was sequenced on the Illumina HiSeq 2000. We applied two different computational pipelines to analyze sequencing data; the reference-independent TASSEL pipeline and a reference-based pipeline using SAMtools. Sequence contigs positioned on the integrated physical and genetic map were used for read mapping and variant calling. We found high agreement in genotype calls between the different platforms and high concordance between genetic and reference-based marker order. There was, however, paucity in the number of SNP that were jointly discovered by the different pipelines indicating a strong effect of alignment and filtering parameters on SNP discovery. We show the utility of the current barley genome assembly as a framework for developing very low-cost genetic maps, facilitating high resolution genetic mapping and negating the need for developing de novo genetic maps for future studies in barley. Through demonstration of GBS on semiconductor sequencing platforms, we conclude that the GBS approach is amenable to a range of platforms and can easily be modified as new sequencing technologies, analysis tools and genomic resources develop.

Conflict of interest statement

Competing Interests: The authors have declared that no competing interests exist.

Figures

Figure 1
Figure 1. Design of genotyping-by-sequencing adapters for use with Ion Torrent sequencing chemistry.
1) Genomic DNA (black) is digested with a combination of PstI and MspI producing fragments with corresponding 3’ TGCA (PstI) and 5’ CG (MspI) overhangs. The barcoded forward Ion Adapter 1 (blue) is ligated to the PstI generated overhang and the common Ion Adapter 2 (green) is ligated to the MspI generated overhang. The variable bar code is in bold and the unpaired tail of the Y-adapter is underlined. 2) During the first PCR cycle, the forward primer (orange) binds to the corresponding Adapter 1 site and proceeds to synthesize the complementary strand (grey) to the genomic sequence tag and then the unpaired tail of the Y-adapter. The common MspI-MspI fragments (not shown) have a Y-adapter on both ends and lack a complementary binding site to initialize PCR amplification. 3) During the second and subsequent rounds of PCR, the reverse primer (purple) can bind to the newly synthesized complement and initialize synthesis on the reverse strand. These PCR reactions continue until completion of the fully synthesized fragments. 4) The final fragment is ready to sequence and consists of the Ion Torrent forward priming site (orange) with a bar code (blue) followed by the genomic sequence fragment (black) and the Ion Torrent reverse priming site (purple).
Figure 2
Figure 2. Relationship between number of sequence reads, missing data, sequencing depth and the number of SNP calls.
(a) The average percentage of missing data per SNP in each sequenced sample is plotted as function of the number of sequence reads in that sample. (b) Histogram of missing data per SNP. (c) The number of SNP calls plotted against the minimum depth at a variant position in a given sample to make a successful genotype call. All SNP calls were made with the SAMtools pipeline. The minor allele frequency was set to 30% and the maximum rate of missing data was set to 50%. The sequencing platforms used for this study include Illumina HiSeq2000 (black), Ion Torrent PGM (green) and Ion Torrent Proton (red). The color code for all panels is given in the legend to (a).
Figure 3
Figure 3. Venn diagrams of the number of SNPs identified in each dataset and with the respective bioinformatics pipeline.
(a) SNPs identified with the SAMtools pipeline across all three platforms. (b) SNPs identified with the TASSEL pipeline across all three platforms. (c) SNPs identified in the HiSeq2000 data with both pipelines. (d) SNPs identified in the PGM data with both pipelines. (e) SNPs identified in the Proton data with both pipelines.
Figure 4
Figure 4. Genotyping-by-sequencing marker order based on the International Barley Sequencing Consortium reference framework (y-axis) compared to a de novo genetic order (x-axis) from 94 Morex x Barke recombinant inbred lines genotyped on either the Ion Torrent PGM or Proton platform.
Each dot corresponds to one of 1,584 markers from the de novo map that was positioned to the physical and genetic framework of barley.

Similar articles

See all similar articles

Cited by 56 PubMed Central articles

See all "Cited by" articles

References

    1. Weigel D, Mott R (2009) The 1001 genomes project for Arabidopsis thaliana . Genome Biol 10: 107. doi:10.1186/gb-2009-10-5-107. PubMed: 19519932. - DOI - PMC - PubMed
    1. Cao J, Schneeberger K, Ossowski S, Günther T, Bender S et al. (2011) Whole-genome sequencing of multiple Arabidopsis thaliana populations. Nat Genet 43: 956-963. doi:10.1038/ng.911. PubMed: 21874002. - DOI - PubMed
    1. Gan X, Stegle O, Behr J, Steffen JG, Drewe P et al. (2011) Multiple reference genomes and transcriptomes for Arabidopsis thaliana . Nature 477: 419-423. doi:10.1038/nature10414. PubMed: 21874022. - DOI - PMC - PubMed
    1. Huang X, Wei X, Sang T, Zhao Q, Feng Q et al. (2010) Genome-wide association studies of 14 agronomic traits in rice landraces. Nat Genet 42: 961-967. doi:10.1038/ng.695. PubMed: 20972439. - DOI - PubMed
    1. Gore MA, Chia JM, Elshire RJ, Sun Q, Ersoz ES et al. (2009) A first-generation haplotype map of maize. Science 326: 1115-1117. doi:10.1126/science.1177837. PubMed: 19965431. - DOI - PubMed

Publication types

Substances

Grant support

This research was supported in part by the United States Department of Agriculture – Agricultural Research Service (Appropriation No. 5430-21000-006-00D) and Kansas State University as well as funds from the German Ministry of Education and Research (BMBF fund TRITEX-0315954A) and EU FP7 project TriticeaeGenome to NS. The funders had no role in study design, data collection and analysis, decision to publish, or preparation of the manuscript.

LinkOut - more resources

Feedback