Skip to main page content
Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
, 9 (9), 2963-2975

Multivariate Genome-Wide Association Analyses Reveal the Genetic Basis of Seed Fatty Acid Composition in Oat ( Avena sativa L.)

Affiliations

Multivariate Genome-Wide Association Analyses Reveal the Genetic Basis of Seed Fatty Acid Composition in Oat ( Avena sativa L.)

Maryn O Carlson et al. G3 (Bethesda).

Abstract

Oat (Avena sativa L.) has a high concentration of oils, comprised primarily of healthful unsaturated oleic and linoleic fatty acids. To accelerate oat plant breeding efforts, we sought to identify loci associated with variation in fatty acid composition, defined as the types and quantities of fatty acids. We genotyped a panel of 500 oat cultivars with genotyping-by-sequencing and measured the concentrations of ten fatty acids in these oat cultivars grown in two environments. Measurements of individual fatty acids were highly correlated across samples, consistent with fatty acids participating in shared biosynthetic pathways. We leveraged these phenotypic correlations in two multivariate genome-wide association study (GWAS) approaches. In the first analysis, we fitted a multivariate linear mixed model for all ten fatty acids simultaneously while accounting for population structure and relatedness among cultivars. In the second, we performed a univariate association test for each principal component (PC) derived from a singular value decomposition of the phenotypic data matrix. To aid interpretation of results from the multivariate analyses, we also conducted univariate association tests for each trait. The multivariate mixed model approach yielded 148 genome-wide significant single-nucleotide polymorphisms (SNPs) at a 10% false-discovery rate, compared to 129 and 73 significant SNPs in the PC and univariate analyses, respectively. Thus, explicit modeling of the correlation structure between fatty acids in a multivariate framework enabled identification of loci associated with variation in seed fatty acid concentration that were not detected in the univariate analyses. Ultimately, a detailed characterization of the loci underlying fatty acid variation can be used to enhance the nutritional profile of oats through breeding.

Keywords: GWAS; fatty acid methyl ester; multivariate GWAS; oat.

Figures

Figure 1
Figure 1
Inferred pathways of fatty acid synthesis and modification in oat seeds. Fatty acid abbreviations adhere to standard conventions, with chain length and degree of unsaturation (position[s] of double bond[s]) separated by a colon (for example, palmitic acid is denoted by 16:0). We detected two isomers of 18:1, 18:1(9) and another isomer, 18:1*, with unknown double-bond position (likely 18:1(11)). The 18:1(9) isomer was more abundant. Fatty acids up to 18 carbons in length are synthesized by a fatty acid synthase (FAS) complex. The nascent acyl chain is attached to acyl carrier protein (ACP) subunit of FAS and grows by two carbons per cycle through the action of distinct ketoacyl-ACP synthase (KAS) subunits of this complex (KASI and KASII are shown). Elongation is terminated either by a thioesterase that releases the fatty acid from ACP, or a double bond is introduced by an acyl-ACP desaturase (AAD) that typically acts with specificity for the ∆9 position and preference for C18 substrate. Thus, the initial fatty acid produced by FAS results from competition between one or more thioesterase and AAD isoforms. Subsequent elongation to ≥ 20C is catalyzed by the fatty acid elongase complex, in which the ketoacyl-CoA synthase (KCS) subunit determines chain length. Further desaturation is catalyzed by additional fatty acid desaturases (FAD2 and FAD3).
Figure 2
Figure 2
Variation in seed fatty acid concentration in a diverse oat panel. A. Box plots of the ten FAME best-linear unbiased predictor (BLUP) distributions measured in an oat diversity panel (n = 492). Compounds are divided into three groups based on mean concentration, each plotted with distinct y-axes. The color of the box denotes the number of double bonds in the corresponding FAME. B. Overlaid histograms of unsaturated, saturated, and total FAME BLUPs derived from the independent FAME measurements. C. The cumulative contribution of each FAME to total concentration plotted on a log-transformed scale.
Figure 3
Figure 3
Fatty acid methyl ester (FAME) correlation networks. A. A network constructed from Pearson’s correlations (r) between FAMEs. B. An analogous network constructed from pairwise partial correlations (pr) with edges corresponding to biosynthetic steps annotated (See Figure 1 for abbreviations). In both A and B, an edge was drawn between compounds if the pairwise r or pr value was significant given a threshold of α = 0.05, after applying a Bonferroni correction for multiple-testing. Edge width is proportional to the magnitude of the correlation, and edge color indicates a positive or negative correlation. Node size corresponds to the mean concentration (mg g-1), with compounds grouped into three categories: less than 1, 10, and 25 mg g-1. Node color differentiates saturated from unsaturated FAMEs.
Figure 4
Figure 4
Multivariate genome-wide association study (multi-GWAS). A. Negative log10 p-values from a multi-GWAS of ten FAME BLUPs plotted against genetic position in the consensus linkage map (Chaffin et al. 2016; Bekele et al. 2018). Dotted lines denote the three significance thresholds considered, with a Bonferroni-corrected threshold of 5% in red, and 5 and 10% false-discovery rate (FDR) thresholds in light and dark gray, respectively. Markers with a p-value passing the Bonferroni threshold are shown in red. B. A quantile-quantile plot of the multi-GWAS p-values. C. Venn diagram comparing significantly associated SNPs (at a 10% FDR) from combined univariate analyses of ten individual and total FAME untransformed BLUPs (Univariate), combined univariate analyses of ten principal components (PCs) of the ten FAME BLUP data matrix, and multi-GWAS of the ten FAME quantile-transformed BLUPs (Multivariate).
Figure 5
Figure 5
Linkage disequilibrium between markers associated with fatty acid methyl ester (FAME) variation identified in multivariate, principal component (PC), and univariate analyses. A. Linkage disequilibrium (LD), defined as the squared allele frequency correlation coefficient (r2), between markers identified as significantly associated with phenotypic variation at a 10% false-discovery rate (FDR) threshold in any of the multivariate, PC, or marginal analyses. (Here, we show only those significant markers that were in LD with at least two other significantly associated markers (see Materials and Methods). Single-nucleotide polymorphisms (SNPs) are ordered by hierarchical clustering, with abbreviated SNP name on the y-axis colored by linkage group assignment (see B) in the consensus genetic map (Chaffin et al. 2016; Bekele et al. 2018). Gray boxes denote missing values (see Materials and Methods). B. An incidence matrix of results from the multivariate, PC, and univariate association analyses. The x-axis mirrors that of A, with each vertical tract corresponding to the same SNP pictured in A. A colored rectangle indicates that the SNP was identified as significant at an FDR threshold of 10%, with the PC and Univariate tracks representing the union of significant SNPs across all ten and 11 traits, respectively. The color of the rectangle corresponds to the linkage group. A black dot along the gray track indicate that the SNP has a minor allele frequency < 5%.
Figure 6
Figure 6
Comparison of multivariate, principal component (PC), and univariate genome-wide association study (GWAS) results. A. An incidence matrix of results from the multivariate, PC, and univariate GWAS. SNPs are ordered by hierarchical clustering of pairwise linkage disequilibrium (LD) estimates, as in Figure 5. A colored rectangle indicates that a SNP was significant in the given test at a 10% false-discovery rate (FDR) threshold, with color corresponding to a linkage group in the consensus genetic map (Chaffin et al. 2016; Bekele et al. 2018). Presence of a black dot indicates that the SNP has a minor allele frequency < 5%. Labeled (a-f) line segments span the SNPs in six LD clusters, defined by visualization of the pairwise LD matrix (see Figure 5). B. Mean phenotypic differences between distinct homozygote classes at a significantly associated SNP within each of the six LD clusters (a-f) defined in A. Genotype is plotted on the x-axis, with the mean number of phenotypic standard deviations away from the mean on the y-axis. Each centered and unit-variance scaled FAME BLUP (including total) is plotted, with warm colors for saturated and cool colors for unsaturated FAMEs. Total is indicated by a black line. The bold lines indicate that the pairwise t-test between genotype class means was significant at α = 0.05 after correcting for multiple testing with a Bonferroni correction. To simplify the plots, only traits with significantly different means between genotypes are labeled. As lines with missing genotypes are excluded from these calculations, we present the genotype counts in the bottom right or left-hand corner of each plot.

Similar articles

See all similar articles

References

    1. Aitchison J., 1983. Principal component analysis of compositional data. Biometrika 70: 57–65. 10.1093/biomet/70.1.57 - DOI
    1. Aschard H., Vilhjálmsson B. J., Greliche N., Morange P.-E., Trégouët D.-A. et al. , 2014. Maximizing the Power of Principal-Component Analysis of Correlated Phenotypes in Genome-wide Association Studies. Am. J. Hum. Genet. 94: 662–676. 10.1016/j.ajhg.2014.03.016 - DOI - PMC - PubMed
    1. Asoro F. G., Newell M. A., Beavis W. D., Scott M. P., Tinker N. A. et al. , 2013. Genomic, Marker-Assisted, and Pedigree-BLUP Selection Methods for β-Glucan Concentration in Elite Oat. Crop Sci. 53: 1894 10.2135/cropsci2012.09.0526 - DOI
    1. Banaś A., Debski H., Banaś W., Heneen W. K., Dahlqvist A. et al. , 2007. Lipids in grain tissues of oat (Avena sativa): differences in content, time of deposition, and fatty acid composition. J. Exp. Bot. 58: 2463–2470. 10.1093/jxb/erm125 - DOI - PubMed
    1. Bekele W. A., Wight C. P., Chao S., Howarth C. J., and Tinker N. A., 2018. Haplotype-based genotyping-by-sequencing in oat genome research. Plant Biotech. J. 16: 1452–1463. - PMC - PubMed

Publication types

LinkOut - more resources

Feedback