Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
. 2019 Jul 9;47(12):6098-6113.
doi: 10.1093/nar/gkz463.

Thermodynamically stable and genetically unstable G-quadruplexes are depleted in genomes across species

Affiliations

Thermodynamically stable and genetically unstable G-quadruplexes are depleted in genomes across species

Emilia Puig Lombardi et al. Nucleic Acids Res. .

Abstract

G-quadruplexes play various roles in multiple biological processes, which can be positive when a G4 is involved in the regulation of gene expression or detrimental when the folding of a stable G4 impairs DNA replication promoting genome instability. This duality interrogates the significance of their presence within genomes. To address the potential biased evolution of G4 motifs, we analyzed their occurrence, features and polymorphisms in a large spectrum of species. We found extreme bias of the short-looped G4 motifs, which are the most thermodynamically stable in vitro and thus carry the highest folding potential in vivo. In the human genome, there is an over-representation of single-nucleotide-loop G4 motifs (G4-L1), which are highly conserved among humans and show a striking excess of the thermodynamically least stable G4-L1A (G3AG3AG3AG3) sequences. Functional assays in yeast showed that G4-L1A caused the lowest levels of both spontaneous and G4-ligand-induced instability. Analyses across 600 species revealed the depletion of the most stable G4-L1C/T quadruplexes in most genomes in favor of G4-L1A in vertebrates or G4-L1G in other eukaryotes. We discuss how these trends might be the result of species-specific mutagenic processes associated to a negative selection against the most stable motifs, thus neutralizing their detrimental effects on genome stability while preserving positive G4-associated biological roles.

PubMed Disclaimer

Figures

Figure 1.
Figure 1.
Mapping the G4-L1 motifs across the human genome. (A) From the outermost to the innermost circle: chromosome cytobands for the hg38 reference genome; rainfall plots (showing the motif locations on the x-axis versus the distance between consecutive motifs on the y-axis); and density plots (distribution shape over chromosomes) of G4-L1 motif (G3N1G3N1G3N1G3) distribution across the human genome. Red, N = A; green, N = T; blue, N = G; orange, N = C and purple, N = {A,T,G,C}. Inset: linear representation for chromosome X, all loop combinations (gray). (B) Distribution of the different loop compositions across the human genome. Red, N = A; green, N = T; blue, N = G; orange, N = C and purple, N = {A,T,G,C}. (C) Distribution of the different loop compositions across the background (random expectation). Red, N = A; green, N = T; blue, N = G; and purple, N = {A,T,G,C}.
Figure 2.
Figure 2.
G4-L1 motif clusters. (A) Number of G4 motifs, G4-L1-7 in red (primary y-axis) and G4-L1 in blue (secondary y-axis), found within increasing sequence distances (in base pairs, bp). (B) Schematic representation of a G4-L1 cluster, defined as a 500 bp region containing at least three non-overlapping G4-L1 motifs (i.e. 3 motifs/0.5 Kbp or 6 motifs/Kbp). (C) Distribution of the different loop compositions of G4-L1 motifs found within clusters across the human genome. Red, N = A; green, N = T; blue, N = G; orange, N = C and purple, N = {A,T,G,C}. (D) Microsatellite motifs were defined as n ≥ 2 repeats of the [GGGX] subunit. To perform the search, positions −3 to −1 were set to G and position 0 was variable (X). The number of repeats (n) of the subunit is shown on the x-axis, whilst the y-axis represents the nucleotide composition of the X position for each of the n values. The number of occurrences, for each value of n, is shown over the corresponding bar. Red, X = A; green, X = T; blue, X = G; orange and X = C. *, Chi-squared goodness of fit tests P < 0.05 (comparison of the observed distributions to an expected homogenous distribution). (E) Variability of the [GGGX]n micro-satellites in the human genome. The heatmap shows log2 fold-change differences between [GGGX]n motif counts in hg38 and in a genome where common SNPs (from the 1000 Genomes Project database) were masked, for different micro-satellite motif sizes. >0: motif less polymorphic than expected; <0: motif more polymorphic than expected. Dendrograms created through by-column and by-row hierarchical clustering were added on top and on the side of the heatmap, respectively.
Figure 3.
Figure 3.
Polymorphism of G4-L1 sequences in the human genome. (A) Base composition of the motif loops in polymorphic G4-L1 sequences. Red, N = A; green, N = T; blue, N = G; and orange, N = C. P-values were calculated using Chi-squared goodness of fit tests. Loop 1, P = 0.029; Loop 2, P = 0.0019; Loop 3, P = 0.054. Inset represents the most frequent motif found within polymorphic G4-L1 sequences. (B) The number of point common variants (SNPs) overlapping with each of the positions of G4-L1 motifs was estimated genome-wide. The number polymorphic positions is shown on the y-axis, and the positions in the G4-L1 sequence are reported on the x-axis. Gray, loop position; black, G-run. (C) Comparison of the amount of polymorphic nucleotides found by position in the G4-L1 15nt-motif. ‘G’ refers to positions {1,3,5,7,9,11,10,12}; ‘middle G’ to positions {2,6,10,14} and ‘N’ to positions {4,8,12}. Adjusted P-values were calculated using pairwise t-tests. *, P < 0.01 (one-way ANOVA P-value = 0.00067). (D) Variant composition by position in the G4-L1 motif and by DNA strand. ‘G’ refers to positions {1,3,5,7,9,11,10,12}; ‘middle G’ to positions {2,6,10,14} and ‘N’ to positions {4,8,12}. Red, N = A; green, N = T; blue, N = G; and orange, N = C. (+), non-template strand; (−), template strand.
Figure 4.
Figure 4.
In vitro and in vivo stability of G4-L1 quadruplexes. (A) Circular dichroism (CD) spectra of G4-L1 sequences. All spectra exhibit the characteristic features of parallel G4 structure i.e. negative peak at 240 nm and positive peak at ∼260 nm. Black, G4-L1A; red, G4-L1C; blue, G4-L1G and magenta, G4-L1T. (B) Mutation rates per cell per division, corrected by the plating efficiency. Black, culture in DMSO; gray, culture +PhenDC3. Significant fold changes between DMSO and +PhenDC3 conditions are shown. Bars, upper 95% confidence interval bound.
Figure 5.
Figure 5.
Genome metrics and G4-L1 motif content of various eukaryotic genomes. (A) Genome size (in mega base pairs, Mbp), GC content and G4-L1 motif density (number of motifs found per Mbp) for different groups of eukaryotes. (B) Relationship between genome size (in mega base pairs, Mbp) and G4-L1 motif counts, GC content and G4-L1 motif counts and GC content and G4-L1G (polyG15) motif content. Spearman correlation coefficients (rho) and their statistical significance are provided at the top of each panel. Regression lines are shown in blue (Preg, linear regression significance).
Figure 6.
Figure 6.
Two distinct loop composition trends exhibited by G4-L1 motifs in various eukaryotic genomes. (A) Unsupervised learning by PCA was performed using the five principal loop compositions (G4-L1A, G4-L1T, G4-L1G, G4-L1AC and mixed loops G4-L1) as variables. Principal components 1 and 2 are plotted on the x- and y-axes, respectively, and show the similarity (distance) between each species based on their quadruplex sequence content only. Each dot represents an organism and each color represents its phylogenetic group. Ellipses were generated using 70% confidence intervals around the barycenters of each phylogenetic group. PC1 accounts for 46% of the variance in the dataset and largely separates G4-L1A rich species (right) from G4-L1G rich species (left). Inset shows the correlation between A loop content (x-axis) and G loop content (y-axis). rho, Spearman's correlation coefficient; Preg, linear regression significance. (B) G4-L1 loop content of the different phylogenetic groups of eukaryotes. P, Kruskal–Wallis rank sum test P-values. Red asterisks indicate mean values significantly different (pairwise comparisons using Wilcoxon rank sum tests, adj P < 0.05) from those of all other groups.
Figure 7.
Figure 7.
Relationship between G4-L1 loop composition and divergence times with the human genome. Loop composition versus divergence with the human genome (Mya, 106 years ago). Upper panels: left, %G-G-G; right, %A-A-A. Lower panels: left, %T-T-T; center, %C-C-C; right, % Mixed loops. Green, primates only; Blue, vertebrates (other than primates); Red, Eukaryotes non-vertebrates. Black dotted lines indicate 55, 200 and 600 Mya, respectively.

Similar articles

Cited by

References

    1. Gellert M., Lipsett M.N., Davies D.R.. Helix formation by guanylic acid. Proc. Natl. Acad. Sci. U.S.A. 1962; 48:2013–2018. - PMC - PubMed
    1. Sen D., Gilbert W.. Formation of parallel four-stranded complexes by guanine-rich motifs in DNA and its implications for meiosis. Nature. 1988; 334:364–366. - PubMed
    1. Burge S., Parkinson G.N., Hazel P., Todd A.K., Neidle S.. Quadruplex DNA: sequence, topology and structure. Nucleic Acids Res. 2006; 34:5402–5415. - PMC - PubMed
    1. Paeschke K., Simonsson T., Postberg J., Lipps H.J.. Telomere end-binding proteins control the formation of G-quadruplex DNA structures in vivo. Nat. Struct. Mol. Biol. 2005; 12:847–854. - PubMed
    1. Paeschke K., Juranek S., Simonsson T., Hempel A., Rhodes D., Lipps H.J.. Telomerase recruitment by the telomere end binding protein-beta facilitates G-quadruplex DNA unfolding in ciliates. Nat. Struct. Mol. Biol. 2008; 15:598–604. - PubMed

Publication types