Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
. 2018 Jul;28(7):1067-1078.
doi: 10.1101/gr.231068.117. Epub 2018 May 15.

Mapping and characterizing N6-methyladenine in eukaryotic genomes using single-molecule real-time sequencing

Affiliations

Mapping and characterizing N6-methyladenine in eukaryotic genomes using single-molecule real-time sequencing

Shijia Zhu et al. Genome Res. 2018 Jul.

Abstract

N6-Methyladenine (m6dA) has been discovered as a novel form of DNA methylation prevalent in eukaryotes; however, methods for high-resolution mapping of m6dA events are still lacking. Single-molecule real-time (SMRT) sequencing has enabled the detection of m6dA events at single-nucleotide resolution in prokaryotic genomes, but its application to detecting m6dA in eukaryotic genomes has not been rigorously examined. Herein, we identified unique characteristics of eukaryotic m6dA methylomes that fundamentally differ from those of prokaryotes. Based on these differences, we describe the first approach for mapping m6dA events using SMRT sequencing specifically designed for the study of eukaryotic genomes and provide appropriate strategies for designing experiments and carrying out sequencing in future studies. We apply the novel approach to study two eukaryotic genomes. For green algae, we construct the first complete genome-wide map of m6dA at single-nucleotide and single-molecule resolution. For human lymphoblastoid cells (hLCLs), it was necessary to integrate SMRT sequencing data with independent sequencing data. The joint analyses suggest putative m6dA events are enriched in the promoters of young full-length LINE-1 elements (L1s), but call for validation by additional methods. These analyses demonstrate a general method for rigorous mapping and characterization of m6dA events in eukaryotic genomes.

PubMed Disclaimer

Figures

Figure 1.
Figure 1.
Differences between bacterial and eukaryotic m6dA methylomes and a novel approach for mapping m6dA events in eukaryotic organisms. (A) Comparison between bacterial and eukaryotic m6dA methylomes over three aspects. (B) A novel approach for mapping and characterizing m6dA events in eukaryotic genomes. The novel approach, including a set of methods as summarized on the left, is comprehensively evaluated using subsampled bacterial m6dA methylome data and applied to Chlamydomonas reinhardtii (green algae) and human lymphoblastoid cells (LCLs).
Figure 2.
Figure 2.
Comprehensive evaluation of m6dA detection based on SMRT-seq data. (A,B) Sensitivity-FDR curves at different levels of per strand SMRT-seq coverage (A) and fraction of methylated A sites in the genome (B). Curves are estimated based on either P-value or IPD ratio; both are shown. FDR estimation is based on the coverage-matched native (Escherichia coli with m6dA at GATC sites; Methods) and WGA samples. (C) FDRs estimated for different combinations of per strand SMRT-seq coverage and fraction of m6dA sites, f(m6dA/A), in the genome. FDR estimation is based on the coverage-matched native and WGA samples (Methods) at an IPD ratio of four. (D) Motif specific methylation detection leads to more reliable m6dA calls with lower FDRs. (E) Distribution of P-values (−log10) and IPD ratios of m6dA events (red) and nonmethylated A's (black) from 11 well-characterized bacterial m6dA methylomes. (F) Enrichment score for motifs with different fractions of motif sites methylated across the genome fm(m6dA/A), estimated based on P-value (−log10; left) and IPD ratio (right). SMRT-seq data from 11 bacterial species/strains with well-characterized m6dA methylomes are used for this simulation analysis. (G) Schematic illustrating single-molecule-level analysis for the estimation of partial methylation. A single molecule (two DNA strands and two adapters) and the subreads that are produced from the top strand of this molecule in SMRT-seq (top). For a given genomic position, when non-single-molecule analysis is performed, IPD ratios for the methylated and nonmethylated subreads follow two exponential distributions (red and black curves in the second panel). In contrast, when single-molecule analysis was performed, IPD ratios across all molecules follow two normal distributions with smaller variance over increasing coverage per molecule strand (third and fourth panels). (H) Estimation of partial methylation fl(m6dA/A) by aggregate analysis (left) and single-molecule-level analysis (right). x-axis indicates background truth fl based on simulation; y-axis, estimated fl; and dots, 4359 A's with known fraction of m6dA methylation based on subsampling from a well-characterized E. coli m6dA methylome. (I,J) Distribution of IPD ratios for partially methylated m6dA sites and nonmethylated A's based on aggregate analysis (I) and single-molecule level analysis (J). The inset provides an enlarged view. The motif enrichment score for the same, known methylation motif GATC significantly differs between the two types of analyses (1.3 in aggregated analysis vs. 25 in single-molecule analysis).
Figure 3.
Figure 3.
Characterization of a complete m6dA methylome of C. reinhardtii reveals novel biological insights. (A) FDR estimation by comparing the IPD ratio distribution of C. reinhardtii native (red) with WGA (black) samples. The inset provides an enlarged view. (B) A rigorous motif enrichment analysis reveals that VATB (V = A, C, or G and B = C, G, or T) is the m6dA motif of in C. reinhardtii. Each 4 × 4 heatmap corresponds to all 16 4-mer motifs, for which the second and third bases are fixed at the center/title (e.g., AA). The rows and columns in the heatmaps represent the first and last bases of 4-mer motifs. Each cell in the following 4 × 4 heatmaps shows the motif enrichment score based on the native DNA sample. (C) Putative m6dA sites called by SMRT-seq are highly consistent with those detected by independent techniques: m6dA-DIP-seq (DIP), m6dA-CLIP-exo-seq (CLIP), and m6dA-RE-seq (RE). (D) VATB, but not non-VATB (i.e., TATN/NATA), motifs have a periodic pattern of IPD ratio distribution around TSSs. Average IPD ratio (normalized by motif frequency) for each of the nine VATB motifs (top) and each of the seven non-VATB motifs (bottom) are plotted around TSSs. (E) Relationship across four different distributions (top to bottom panels): average IPD ratio of VATB sites, nucleosome positioning, and frequency of VATB and non-VATB motif sites. Peaks and valleys of the periodic patterns are indicated by red and blue dots, aligned across the four panels. (F) Illustrative examples showing m6dA sites near the TSSs of three genes. This figure is adapted from Fu et al. (2015), where we project m6dA sites detected by SMRT-seq (red dots; FDR < 0.05; randomly generated heights to ease visualization) on top of GATC and CATG sites detected by m6dA-RE-seq (blue bars; middle) and nucleosome occupancy (bottom). (G) m6dA events at VATB sites are associated with active gene expression. Average IPD ratios are compared between two groups of genes with high (FPKM > 1) and low (FPKM < 1) expression levels. (H) The correlation between the gene expression level in C. reinhardtii and methylated VATB on gene promoters. The x-axis represents the number of methylated VATB sites (IPD ratio > 4.5; FDR = 0.05) within [0, +2000 bp] of TSSs. The y-axis represents the mean log2 FPKM of genes. Error bars, SEs. (IK) Single-molecule, strand-specific analysis of SMRT-seq data to examine full-, non-, or hemi-methylation status at m6dA sites. Three sets of m6dA sites are analyzed: m6dA in GATC sites (I) and CATG sites (J) based on m6dA-RE-seq (Fu et al. 2015) and (K) VATB sites with high aggregate IPD ratio (IPD ratio > 4.5; Methods) based on SMRT-seq. The x- and y-axes denote the single-molecule, strand-specific IPD ratio of each pair of reverse-complementary VATB sites at the two strands of each single molecule.
Figure 4.
Figure 4.
m6dA deposition on full-length L1s in hLCLs. (A) Mean IPD ratio of A sites (adjusted by the frequency of A's) across 1274 young (evolutionary age<10 Myr), full-length (>6000 bp) L1s for three hLCL lines, respectively. Consistent across the trio, the IPD ratio is relatively higher in the promoter and proximal region than the flanking regions. (B) The mean IPD ratio of A sites at full-length L1s is inversely correlated with the L1s’ evolutionary ages in hLCLs. The heatmap shows the mean IPD ratio of A's on each L1, [0, +500] from the 5′ UTR start site, for each of the trio. As indicated in the sidebar, L1s (rows) are ordered by their evolutionary ages. Consistently across the trio, the IPD ratio of A sites is higher in younger full-length L1s than in older L1s. (C) Average m6dA-DIP-seq read count (adjusted for the read count in the input DNA sample and the A/T content) on hLCL young (1274), middle-aged (4164), and old L1 elements (1670), respectively. Consistent with SMRT-seq data, m6dA is enriched at the promoter and proximal region of young full-length L1s. (D) Average m6dA-DIP-seq read count adjusted for the A/T content and the read count in two control samples on hLCL young L1 elements, respectively: input DNA as control (black curve in top panel) and m6dA-DIP-seq on WGA as control (blue curve in bottom panel). (E) Motif AG is enriched for putative m6dA events. The barplot represents the motif enrichment score of all dinucleotide motifs in each of the trio. The putative methylated position is underscored. It suggests that motif AG is enriched for high IPD ratios in clear contrast to all the other dinucleotides. (F) Motif enrichment analysis of human young full-length L1s. Each 4 × 4 heatmap corresponds to all 16 4-mer motifs, for which the second and third bases are fixed at the center/title. The rows and columns in the heatmaps represent the first and last bases of 4-mer motifs. Each cell in the following 4 × 4 heatmaps shows the motif enrichment score based on the native DNA sample. (G) Peaks of putative m6dA events across human young full-length L1s occur at loci with certain sequence features. (Top) Level of sequence conservation across young full-length L1 elements based on multiple alignment by Mauve (Darling et al. 2004); (two middle panels) frequency of AG dinucleotides (relative to A's) and A's on young full-length L1s; and (bottom) frequency of putative m6dA events at each locus across all young full-length L1s (averaged among the trio). The peaks of sequence conservation, AG/A frequency, and m6dA frequency across young full-length L1s are colocalized as indicated by the red, blue, and green dots.

Similar articles

Cited by

References

    1. Babushok DV, Kazazian HH. 2007. Progress in understanding the biology of the human mutagen LINE-1. Hum Mutat 28: 527–539. - PubMed
    1. Bailey TL. 2011. DREME: motif discovery in transcription factor ChIP-seq data. Bioinformatics 27: 1653–1659. - PMC - PubMed
    1. Beaulaurier J, Zhu S, Sebra R, Zhang X-S, Rosenbluh C, Deikus G, Shen N, Munera D, Waldor MK, Blaser M, et al. 2015. Single molecule-level detection and long read-based phasing of epigenetic variations in bacterial methylomes. Nat Commun 6: 7438. - PMC - PubMed
    1. Blow MJ, Clark TA, Daum CG, Deutschbauer AM, Fomenkov A, Fries R, Froula J, Kang DD, Malmstrom RR, Morgan RD. 2016. The epigenomic landscape of prokaryotes. PLoS Genet 12: e1005854. - PMC - PubMed
    1. Casadesús J, Low D. 2006. Epigenetic gene regulation in the bacterial world. Microbiol Mol Biol Rev 70: 830–856. - PMC - PubMed

Publication types

LinkOut - more resources