Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
. 2019 Jun;132(6):1639-1659.
doi: 10.1007/s00122-019-03304-5. Epub 2019 Feb 26.

Genome-wide association study of seed protein, oil and amino acid contents in soybean from maturity groups I to IV

Affiliations

Genome-wide association study of seed protein, oil and amino acid contents in soybean from maturity groups I to IV

Sungwoo Lee et al. Theor Appl Genet. 2019 Jun.

Abstract

Genomic regions associated with seed protein, oil and amino acid contents were identified by genome-wide association analyses. Geographic distributions of haplotypes indicate scope of improvement of these traits. Soybean [Glycine max (L.) Merr.] protein and oil are used worldwide in feed, food and industrial materials. Increasing seed protein and oil contents is important; however, protein content is generally negatively correlated with oil content. We conducted a genome-wide association study using phenotypic data collected from five environments for 621 accessions in maturity groups I-IV and 34,014 markers to identify quantitative trait loci (QTL) for seed content of protein, oil and several essential amino acids. Three and five genomic regions were associated with seed protein and oil contents, respectively. One, three, one and four genomic regions were associated with cysteine, methionine, lysine and threonine content (g kg-1 crude protein), respectively. As previously shown, QTL on chromosomes 15 and 20 were associated with seed protein and oil contents, with both exhibiting opposite effects on the two traits, and the chromosome 20 QTL having the most significant effect. A multi-trait mixed model identified trait-specific QTL. A QTL on chromosome 5 increased oil with no effect on protein content, and a QTL on chromosome 10 increased protein content with little effect on oil content. The chromosome 10 QTL co-localized with maturity gene E2/GmGIa. Identification of trait-specific QTL indicates feasibility to reduce the negative correlation between protein and oil contents. Haplotype blocks were defined at the QTL identified on chromosomes 5, 10, 15 and 20. Frequencies of positive effect haplotypes varied across maturity groups and geographic regions, providing guidance on which alleles have potential to contribute to soybean improvement for specific regions.

PubMed Disclaimer

Conflict of interest statement

The authors declared that they have no conflict of interest.

Figures

Fig. 1
Fig. 1
Phenotypic distribution of seed protein and oil contents (a) by scaled best linear unbiased predictor (BLUP) values across all environments (ALL) among the 621 plant introductions and their correlation (P < 0.0001) (b). Phenotypic distribution of amino acids by scaled BLUP values across all environments was also shown (c)
Fig. 2
Fig. 2
Population structure of the 621 soybean accessions. a Plot of STRUCTURE analysis (K = 2). Accessions were sorted by geographic location from which each accession was collected, and colored bars correspond to the STRUCTURE assignments (Q1 and Q2) (b). Principle component analysis (PCA) of the 621 soybean accessions with the country of origin indicated by color of marker
Fig. 3
Fig. 3
Manhattan plots (left) and QQ-plots (right) for GWAS of the 621 soybean accessions for protein (a) and oil (b) contents using multi-locus mixed model and opposite effect (c) by multi-trait mixed model. The trait associations for 34,014 SNPs were plotted by all environments combined (ALL) (a and b) or the Wooster, Ohio 2015 environment (OHW15) (c). Red and blue horizontal lines in the Manhattan plots and markers in the QQ-plots represent the genome-wide significant threshold (5%) and suggestive significance thresholds (25%), respectively, and the SNPs significantly associated at those levels. Shaded regions of the QQ-plots represent a 95% confidence interval (color figure online)
Fig. 4
Fig. 4
The 40–42.5 Mb region on Chr 5 covering significantly associated trait-specific SNPs identified by multi-trait mixed model. Negative log10P-values of for the Illinois 2015 environment (IL15) are plotted against physical genomic position (Glyma.Wm82.a2.v1). Horizontal lines are as described in Fig. 3. Previously identified QTL are indicated with horizontal arrows and were obtained from SoyBase (http://soybase.org)
Fig. 5
Fig. 5
The 43–48 Mb region on Chr 10 covering significantly associated trait-specific SNPs identified by multi-trait mixed model. Negative log10P-values of for the Wooster, Ohio 2014 environment (OHW14) are plotted against physical position (Glyma.Wm82.a2.v1). Horizontal lines and arrows are as described in Fig. 4. The maturity gene (E2) indicated by the vertical line is coincident with these significant markers
Fig. 6
Fig. 6
Manhattan plots (left) and QQ-plots (right) for genome-wide association study of the 621 soybean accessions using multi-locus mixed model for methionine (a), cysteine (b), lysine (c) and threonine (d) on a g kg cp−1 basis across all environments (ALL). Horizontal lines, markers and shading are as described in Fig. 3
Fig. 7
Fig. 7
Distribution of haplotypes of trait-specific QTL for protein and oil on Chr 5 (a) and Chr 10 (b) and QTL for protein and oil on Chr 15 (c) and Chr 20 (d). The frequency of each haplotype, illustrated in pie charts, was placed according to the geographic locations of major populations from Russia, Asia and North America. Size of pie chart is correlated to the number of accessions in the region. Haplotypes are as described in Table 5. The figure map was created using the R package ‘maps’ and ‘mapdata’ in the R project

Similar articles

Cited by

References

    1. Bandillo N, Jarquin D, Song Q, Nelson R, Cregan P, Specht J, Lorenz A. A population structure and genome-wide association analysis on the USDA soybean germplasm collection. Plant Genome. 2015;8:1–13. doi: 10.3835/plantgenome2015.04.0024. - DOI - PubMed
    1. Barrett JC, Fry B, Maller J, Daly MJ. Haploview: analysis and visualization of LS and haplotype maps. Bioinformatics. 2005;21:263–265. doi: 10.1093/bioinformatics/bth457. - DOI - PubMed
    1. Boehm JD, Jr, Nguyen V, Tashiro RM, Anderson D, Shi C, Wu X, Woodrow L, Yu K, Cui Y, Li Z. Genetic mapping and validation of the loci controlling 7S a’ and 11S A-type storage protein subunits in soybean [Glycine max (L.) Merr.] Theor Appl Genet. 2018;131:659–671. doi: 10.1007/s00122-017-3027-9. - DOI - PubMed
    1. Bradbury PJ, Zhang Z, Kroon DE, Casstevens TM, Ramdoss Y, Buckler ES. TASSEL: software for association mapping of complex traits in diverse samples. Bioinformatics. 2007;23:2633–2635. doi: 10.1093/bioinformatics/btm308. - DOI - PubMed
    1. Brosnan JT, Brosnan ME. The sulfur-containing amino acids: an overview. J Nutr. 2006;136:16365–16405. - PubMed