Skip to main page content
Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
. 2014 Jan;26(1):121-35.
doi: 10.1105/tpc.113.119982. Epub 2014 Jan 31.

Insights Into the Maize Pan-Genome and Pan-Transcriptome

Affiliations
Free PMC article

Insights Into the Maize Pan-Genome and Pan-Transcriptome

Candice N Hirsch et al. Plant Cell. .
Free PMC article

Abstract

Genomes at the species level are dynamic, with genes present in every individual (core) and genes in a subset of individuals (dispensable) that collectively constitute the pan-genome. Using transcriptome sequencing of seedling RNA from 503 maize (Zea mays) inbred lines to characterize the maize pan-genome, we identified 8681 representative transcript assemblies (RTAs) with 16.4% expressed in all lines and 82.7% expressed in subsets of the lines. Interestingly, with linkage disequilibrium mapping, 76.7% of the RTAs with at least one single nucleotide polymorphism (SNP) could be mapped to a single genetic position, distributed primarily throughout the nonpericentromeric portion of the genome. Stepwise iterative clustering of RTAs suggests, within the context of the genotypes used in this study, that the maize genome is restricted and further sampling of seedling RNA within this germplasm base will result in minimal discovery. Genome-wide association studies based on SNPs and transcript abundance in the pan-genome revealed loci associated with the timing of the juvenile-to-adult vegetative and vegetative-to-reproductive developmental transitions, two traits important for fitness and adaptation. This study revealed the dynamic nature of the maize pan-genome and demonstrated that a substantial portion of variation may lie outside the single reference genome for a species.

Figures

Figure 1.
Figure 1.
Flowchart, Support Statistics, and Gene Expression Distribution for the Joint Assembly across 503 Diverse Maize Inbred Lines. (A) Flowchart describing annotation of the RTAs from the joint assembly as putative contaminant sequences, putative alleles/homologs/paralogs/orthologs, or novel sequences. (B) Support of the filtered RTAs. RTAs were searched against the UniRef100 database requiring a minimum E-value of 1e-5 and a minimum of 50% coverage and 50% identity using WU BLASTX, the O. sativa v7 proteins, and the S. bicolor v1 proteins requiring a minimum E-value of 1e-10 and a minimum of 70% coverage and 70% identity using WU BLASTX, and the maize PlantGDB-assembled unique transcripts (PUTs) version 171a requiring a minimum E-value of 1e-10 and a minimum of 85% coverage and 85% identity using WU BLAST. (C) Distribution of gene expression in the maize seedling pan-transcriptome using a quantitative presence/absence classification. Genes were considered expressed if the fragments per kilobase of exon model per million fragments mapped 95% low confidence interval boundary as defined by Cufflinks was greater than zero.
Figure 2.
Figure 2.
LD Mapping of RTAs. (A) Pairwise r2 values between the 458,259 SNPs anchored to the B73 maize v2 reference sequence and the 13 SNPs on RTA_10140. (B) Pictorial representation of the maize chromosomes with the position of each of the 3396 LD-mapped RTAs. Colors of lines signify RTAs that are expressed in every inbred line (black), are not expressed in every inbred line but are expressed in B73 (blue), or are not expressed in every inbred line and are not expressed in B73 (red). (C) Synteny analysis for RTA_10140 relative to the rice chromosome 12 sequence. Solid boxes show orthologous genes identified using EXONERATE with a minimum threshold of 70% identity over 70% of the length of the rice CDS sequence, and dashed boxes indicate orthologous genes identified using OrthoMCL with CDS sequences from rice v7, maize v2, and sorghum v1, and the 8681 RTAs.
Figure 3.
Figure 3.
Evaluation of the Restricted/Unrestricted Nature of the Maize Pan-Genome. Sequence-based clustering showing the number of RTAs plateaus at ∼350 inbred lines. Clusters are defined as OrthoMCL groups with at least two RTAs.
Figure 4.
Figure 4.
GWAS for Vegetative Phase Change Measured as Last Leaf with Epicuticular Wax. (A) Manhattan plot of GWAS results using SNP markers. Significance threshold (horizontal dashed line) was set using the simpleM method (2.7 × 10−7). Significant SNPs were located in glossy15. (B) Manhattan plot of GWAS results using transcript abundance as the independent variable. GRMZM2G096016 encodes a nuclear transcription factor Y subunit A-10 gene. Significance threshold was set using Bonferroni correction (1.04 × 10−6).
Figure 5.
Figure 5.
GWAS for GDDs to Pollen Shed. (A) Manhattan plot of GWAS results using SNP markers. Genome-wide significance threshold (horizontal dashed line) was set using the simpleM method (2.7 × 10−7). GRMZM2G171622 encodes a CBS domain–containing protein. (B) Manhattan plot of GWAS results using gene expression level as the independent variable for GDDs to pollen shed. Significance threshold was set using Bonferroni correction (1.04 × 10−6). (C) LD heat map between the most significant gene on chromosome 3 based on SNP markers for GDDs to pollen shed, GRMZM2G171622, and a likely candidate gene, GRMZM2G171650. Asterisks indicate significant SNPs identified through GWAS.

Similar articles

See all similar articles

Cited by 115 articles

See all "Cited by" articles

Publication types

LinkOut - more resources

Feedback