Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
Review
. 2023 Jun;24(6):363-381.
doi: 10.1038/s41576-022-00559-5. Epub 2023 Jan 18.

Navigating the pitfalls of mapping DNA and RNA modifications

Affiliations
Review

Navigating the pitfalls of mapping DNA and RNA modifications

Yimeng Kong et al. Nat Rev Genet. 2023 Jun.

Abstract

Chemical modifications to nucleic acids occur across the kingdoms of life and carry important regulatory information. Reliable high-resolution mapping of these modifications is the foundation of functional and mechanistic studies, and recent methodological advances based on next-generation sequencing and long-read sequencing platforms are critical to achieving this aim. However, mapping technologies may have limitations that sometimes lead to inconsistent results. Some of these limitations are technical in nature and specific to certain types of technology. Here, however, we focus on common (yet not always widely recognized) pitfalls that are shared among frequently used mapping technologies and discuss strategies to help technology developers and users mitigate their effects. Although the emphasis is primarily on DNA modifications, RNA modifications are also discussed.

PubMed Disclaimer

Conflict of interest statement

Competing interests

The authors declare no competing interests.

Figures

Figure 1.
Figure 1.. DNA/RNA modification mapping methods based on next generation sequencing and long read sequencing technologies.
a. Next generation sequencing (NGS)-based methods require pre-treatment or pre-labelling of the nucleic acid with antibodies (left), restriction enzymes (middle) or chemicals (right) before sequencing, so that modified and unmodified bases to be distinguished during the NGS sequencing. b. Long read sequencing (LRS)-based methods can directly detect modified bases. Left, for SMRT sequencing, a DNA polymerase (or reverse transcriptase) is bound within the zero-mode waveguide (ZMW). When dNTP is incorporated at the polymerase active site, it will emit a fluorescent pulse in the corresponding color channel. The order of pulses provides the read sequence and inter-pulse duration between base incorporation events indicate the presence of a covalent modification in the template DNA/RNA. Right, for nanopore sequencing, it relies on engineered biological nanopores embedded in a lipid membrane to sequence single-stranded DNA (ssDNA) or RNA. The ionic current measured as DNA or RNA gets sequenced through the nanopore depends on the precise set of nucleotides occupying the constriction point. Modified nucleotides in the ssDNA or RNA introduce distinct current patterns, making it possible to detect the existence of modified bases relative to non-modified nucleotides.
Figure 2.
Figure 2.. Overview of experimental pitfalls that can lead to false positive calls of DNA/RNA modifications.
a. Insufficient bisulfite (BS) treatment in BS-seq can leave a small percentage of non-modified cytosines unconverted, which are then falsely called as 5-methylcytosine (5mC) in downstream BS-seq data analysis. FP, false positive. b. The non-specificity of antibodies in DNA immunoprecipitation sequencing (DIP-seq) or RNA immunoprecipitation sequencing (RIP-seq) can result in systematic false positive calls at unmodified bases, modified bases that are not the form of interest, or repetitive sequences with DNA secondary structure. 6mA, N6-methyldeoxyadenosine. Ref., Reference genome. c. Certain mRNAs contamination through standard DNA extraction protocols may confound next generation sequencing (NGS) DNA sequencing and lead to false positive peaks in DIP-seq.
Figure 3.
Figure 3.. Overview of analytical pitfalls that can lead to false positive calls of DNA/RNA modifications.
a. For single-molecule, real-time sequencing (SMRT-seq), false positives (FP) can arise in methylation free whole genome amplification (WGA) sample, especially at high sequencing depth, because standard tools are based on fixed threshold on modification quality value (QV, −log10 transformed p value). Ref., Reference genome b. Reference heterogeneity, such as single nucleotide polymorphisms (SNPs), can lead to overestimation of inter-pulse duration (IPD) ratios, resulting in false positives in SMRT-seq. c. In SMRT-seq, modifications other than the one of interest (such as DNA damage) can affect IPD ratio on neighboring bases (in this case, adenine) and result in false positives. Other sequencing platforms and mapping methods also face similar challenges of confounding modifications. d. DNA secondary structure may affect DNA polymerase kinetics and create false positive modifications in the flanking neighborhood by SMRT-seq. NGS and nanopore sequencing may also face similar challenges. In addition, single-stranded RNA is prone to form complex RNA secondary structures, which can confound both NGS- and LRS-based methods for detecting RNA modification.
Figure 4.
Figure 4.. Mitigating false positive mapping calls of DNA and RNA modifications
a. For DIP-seq and RIP-seq, an IgG immunoprecipitated control can help adjust for non-specificity of antibodies and reduce false positive calls. b. For SMRT sequencing, a whole genome amplification (WGA) control help evaluate the false positive calls due to the abnormal DNA polymerase (or RNA reverse transcriptase) kinetics. For example, systematic reduction in kinetics can be due to the secondary structures that can confound the detection of DNA modifications. c. For most sequencing methods, it is more reliable to use FDR than the use of an arbitrary cutoff (e.g. p-value or IPD ratio, etc), even though a cutoff might seem to be ‘consistent’ with LC-MS/MS estimation. d. A quantification model can be used to estimate the abundance of a DNA or RNA modification of interest. The machine learning model is trained with features across a number of positive and negative controls containing the modification at a wide range of abundance. For prediction, the machine learning model can predict modification level along with a confidence interval.
Figure 5.
Figure 5.. Overview of pitfalls that can lead to false negative calls of DNA modifications.
a. An individual technique is often more effective for detecting certain forms of DNA modifications than others. For example, single-molecule, real-time sequencing (SMRT) sequencing has stronger signal-to-noise ratios for 6mA and 4mC events than 5mC and 5hmC. The signal-to-noise ratios of 5mC and 5hmC can be enhanced by converting 5mC and 5hmC to 5fC and 5caC using the Ten-Eleven Translocation (TET) enzyme. 5CaC, 5-carboxylcytosine; 5fC, 5-formylcytosine; 5mC, 5-methylcytosine; 5hmC, 5-hydroxymethylcytosine; 4mC, N4-methylcytosine; 6mA, N6-methyladenine; N, unmodified bases. b. In nanopore sequencing, the signal-to-noise ratio can have drastic variations across different sequence contexts (or motifs), even for the same form of DNA modification, as shown with schematic t-distributed stochastic neighbor embedding (t-SNE) map. c. Prolonged bisulfite treatment can lead to increased conversion of 5mC to Uracil (U, which will be read as T in sequencing) and increased DNA degradation. Both processes can result in false negatives (FNs). d. False negatives can arise when certain genomic sequence motifs targeted by a restriction enzyme (RE) are not adequately digested, for example, owing to insufficient incubation time. e. False negatives can arise due to the use of training datasets that do not represent test datasets. For example, machine learning models trained with a limited set of sequence motifs are not generally applicable for mapping the same form of DNA or RNA modifications in other sequence contexts, here shown with a schematic t-SNE map.

Similar articles

Cited by

References

    1. Greenberg MVC & Bourc’his D The diverse roles of DNA methylation in mammalian development and disease. Nat. Rev. Mol. Cell Biol 20, 590–607 (2019). - PubMed
    1. Michalak EM, Burr ML, Bannister AJ & Dawson MA The roles of DNA, RNA and histone methylation in ageing and cancer. Nat. Rev. Mol. Cell Biol 20, 573–589 (2019). - PubMed
    1. Jiang X et al. The role of m6A modification in the biological functions and diseases. Signal Transduct. Target. Ther 6, (2021). - PMC - PubMed
    1. Sánchez-Romero MA & Casadesús J The bacterial epigenome. Nat. Rev. Microbiol 18, 7–20 (2020). - PubMed
    2. This review summarizes the epigenetic regulation by bacterial DNA methylation and its contribution to the phenotypic heterogeneity in bacterial populations.

    1. Luo C, Hajkova P & Ecker JR Dynamic DNA methylation: In the right place at the right time. Science (80-.). 361, 1336–1340 (2018). - PMC - PubMed

Publication types

MeSH terms