Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
. 2014 Mar 19:15:207.
doi: 10.1186/1471-2164-15-207.

Improved linkage analysis of Quantitative Trait Loci using bulk segregants unveils a novel determinant of high ethanol tolerance in yeast

Affiliations

Improved linkage analysis of Quantitative Trait Loci using bulk segregants unveils a novel determinant of high ethanol tolerance in yeast

Jorge Duitama et al. BMC Genomics. .

Abstract

Background: Bulk segregant analysis (BSA) coupled to high throughput sequencing is a powerful method to map genomic regions related with phenotypes of interest. It relies on crossing two parents, one inferior and one superior for a trait of interest. Segregants displaying the trait of the superior parent are pooled, the DNA extracted and sequenced. Genomic regions linked to the trait of interest are identified by searching the pool for overrepresented alleles that normally originate from the superior parent. BSA data analysis is non-trivial due to sequencing, alignment and screening errors.

Results: To increase the power of the BSA technology and obtain a better distinction between spuriously and truly linked regions, we developed EXPLoRA (EXtraction of over-rePresented aLleles in BSA), an algorithm for BSA data analysis that explicitly models the dependency between neighboring marker sites by exploiting the properties of linkage disequilibrium through a Hidden Markov Model (HMM). Reanalyzing a BSA dataset for high ethanol tolerance in yeast allowed reliably identifying QTLs linked to this phenotype that could not be identified with statistical significance in the original study. Experimental validation of one of the least pronounced linked regions, by identifying its causative gene VPS70, confirmed the potential of our method.

Conclusions: EXPLoRA has a performance at least as good as the state-of-the-art and it is robust even at low signal to noise ratio's i.e. when the true linkage signal is diluted by sampling, screening errors or when few segregants are available.

PubMed Disclaimer

Figures

Figure 1
Figure 1
Bulk segregant analysis for mapping genomic regions linked to a phenotype of interest in yeast. A: A parent displaying the phenotypic trait of interest (superior parent) is crossed with a reference strain lacking the trait (inferior parent). B: The resulting heterozygous diploid strain is then sporulated to generate haploid segregants. C: Segregating offspring carry a mosaic of genetic material derived from both parents (red and blue segments) due to the recombination events in meiosis. After phenotyping, the subset of segregants displaying the trait of the superior parent is selected. D: Genomic DNA extracted from the pooled selected segregants is submitted to whole-genome sequence analysis. Polymorphic genomic regions (marker sites) are identified that allow distinguishing between the parental variants. Counting for each marker site how many variants originate from the superior versus the inferior parent allows determining the variant frequency in the pool for each marker site. Regions linked to the phenotype of interest are expected to originate predominantly from the superior parent (black boxed region). The principle of BSA with diploid organisms is similar, but usually inbred (homozygous) lines are used as parents and two generations are needed to observe segregation of the phenotype.
Figure 2
Figure 2
Hidden Markov Model used to predict genomic regions linked to the phenotype of interest. A: each marker site is modeled to be in a neutral state (N-state, blue circles) or in a state of being linked to the phenotype of interest (P-state, orange circles) based on its observed relative variant frequency in the pool of segregants. B: emission probabilities for respectively the neutral (blue curve) and the phenotype-linked states (orange line) as a function of the relative variant frequencies, modeled by a beta-binomial distribution with respective parameters α and β. C: transition probability as a function of the physical distance between neighboring marker sites.
Figure 3
Figure 3
Effect of the recombination rate (r) on the performance of EXPLoRA. The recovery rate (panel A), average size of the linked region (panel B) and number of falsely predicted regions (Panel C) as a function of the noise level (left sided plots) and the number of marker sites (right sided plots). The noise level is represented by the ratio of the segregants in the pool that have the causal allele versus those that have not (PSC). Results obtained with a number of markers that occur in real experimental settings are indicated with a dotted line.
Figure 4
Figure 4
Effect of αPP on the performance of EXPLoRA. The recovery rate (panel A), average size of the linked region (panel B) and number of falsely predicted regions (Panel C) as a function of the noise level represented by the ratio of the segregants in the pool that have the causal allele versus those that have not (PSC) (left sided plots) and the number of marker sites (right sided plots). Results obtained with a number of markers that occur in real experimental settings are indicated with a dotted line.
Figure 5
Figure 5
Comparison with the state-of-the-art. The recovery rate (panel A), average size of the linked region (panel B) and number of falsely predicted regions (Panel C) under high (left sided plots) and low (right sided plots) noise levels were assessed for EXPLoRA the method of Magwene et al. and MULTIPOOL. In the plots of panel B (average size of the linked region) the y-axis was split into two scales to facilitate showing the results of MULTIPOOL without compressing the curves obtained by EXPLoRA and the method of Magwene et al.
Figure 6
Figure 6
Linkage scores obtained by EXPLoRA for the five QTLs identified in the 16% pool (left) and in the 17% pool (right). The original relative variant frequencies as determined by genome sequencing are displayed for each plot (light gray dots). Solid lines show the posterior probabilities for αPP = 10 whereas dashed lines show the posterior probabilities for αPP = 30.
Figure 7
Figure 7
Experimental validation of QTL2 on chromosome X. A: upper plot shows the region corresponding to QTL2 of which linkage to the phenotype of interest was confirmed by scoring selected marker sites in individual segregants. Scored marker sites are indicated (S4-S7). For each marker site, the p-value indicates the probability to be linked to the phenotype by chance according to a binomial distribution (see materials and methods). Lower plot: zoom in on the genes in the experimentally confirmed region corresponding to QTL2 (29 kb). Black bars: genes with non-synonymous mutations in the coding region; grey bars: genes with mutations in the promotor or terminator; white bars: genes without mutations. B: Reciprocal hemizygosity analysis for the genes with non-synonymous mutations in the coding regions located in the fine-mapped region. To that end, two different diploid strains were constructed by crossing the original superior parent VR1-5B with the inferior parent BY4741, carrying a deletion in its allele of the candidate causative gene or the other way around. Hence, this resulted in two different diploid strains, each with only one functional allele of the candidate causative gene, originating from either the ‘superior’ or the ‘inferior’ parent. The ethanol tolerance of the two diploid strains was compared with dilution spot growth assays on a YPD plate with 16% ethanol and a YPD plate without ethanol as control.

Similar articles

Cited by

References

    1. Liti G, Schacherer J. The rise of yeast population genomics. Comptes Rendus Biol. 2011;15(8–9):612–619. - PubMed
    1. Swinnen S, Thevelein JM, Nevoigt E. Genetic mapping of quantitative phenotypic traits in Saccharomyces cerevisiae. FEMS Yeast Res. 2012;15(2):215–227. doi: 10.1111/j.1567-1364.2011.00777.x. - DOI - PubMed
    1. Swinnen S, Schaerlaekens K, Pais T, Claesen J, Hubmann G, Yang Y, Demeke M, Foulquie-Moreno MR, Goovaerts A, Souvereyns K, Clement L, Dumortier F, Thevelein JM. Identification of novel causative genes determining the complex trait of high ethanol tolerance in yeast using pooled-segregant whole-genome sequence analysis. Genome Res. 2012;15(5):975–984. doi: 10.1101/gr.131698.111. - DOI - PMC - PubMed
    1. Birkeland SR, Jin N, Ozdemir AC, Lyons RH Jr, Weisman LS, Wilson TE. Discovery of mutations in Saccharomyces cerevisiae by pooled linkage analysis and whole-genome sequencing. Genetics. 2010;15(4):1127–1137. doi: 10.1534/genetics.110.123232. - DOI - PMC - PubMed
    1. Wenger JW, Schwartz K, Sherlock G. Bulk segregant analysis by high-throughput sequencing reveals a novel xylose utilization gene from Saccharomyces cerevisiae. PLoS Genet. 2010;15(5):e1000942. doi: 10.1371/journal.pgen.1000942. - DOI - PMC - PubMed

Publication types