Modeling kinetic rate variation in third generation DNA sequencing data to detect putative modifications to DNA bases
- PMID: 23093720
- PMCID: PMC3530673
- DOI: 10.1101/gr.136739.111
Modeling kinetic rate variation in third generation DNA sequencing data to detect putative modifications to DNA bases
Abstract
Current generation DNA sequencing instruments are moving closer to seamlessly sequencing genomes of entire populations as a routine part of scientific investigation. However, while significant inroads have been made identifying small nucleotide variation and structural variations in DNA that impact phenotypes of interest, progress has not been as dramatic regarding epigenetic changes and base-level damage to DNA, largely due to technological limitations in assaying all known and unknown types of modifications at genome scale. Recently, single-molecule real time (SMRT) sequencing has been reported to identify kinetic variation (KV) events that have been demonstrated to reflect epigenetic changes of every known type, providing a path forward for detecting base modifications as a routine part of sequencing. However, to date no statistical framework has been proposed to enhance the power to detect these events while also controlling for false-positive events. By modeling enzyme kinetics in the neighborhood of an arbitrary location in a genomic region of interest as a conditional random field, we provide a statistical framework for incorporating kinetic information at a test position of interest as well as at neighboring sites that help enhance the power to detect KV events. The performance of this and related models is explored, with the best-performing model applied to plasmid DNA isolated from Escherichia coli and mitochondrial DNA isolated from human brain tissue. We highlight widespread kinetic variation events, some of which strongly associate with known modification events, while others represent putative chemically modified sites of unknown types.
Figures
variables represent the hidden modification states for site i, while the
represent the observed IPD values for site i that inform on the modification status of the site. In this model we are considering interactions between the incorporation site,
, and the two nearest neighboring sites on each side of
. The edges between the
variables indicate there can be interactions between the local sites, with the
parameters representing the degree of interaction among the nodes. The
parameters represent the exponential rates for the two possible rate classes at each position i (
), while the
parameters represent the proportion of molecules in state k at position i (with
).
Comment in
-
Epigenetics: Reading methylated genomes.Nat Methods. 2013 Jan;10(1):10-1. doi: 10.1038/nmeth.2320. Nat Methods. 2013. PMID: 23547288 No abstract available.
Similar articles
-
Enhanced 5-methylcytosine detection in single-molecule, real-time sequencing via Tet1 oxidation.BMC Biol. 2013 Jan 22;11:4. doi: 10.1186/1741-7007-11-4. BMC Biol. 2013. PMID: 23339471 Free PMC article.
-
Detecting DNA modifications from SMRT sequencing data by modeling sequence context dependence of polymerase kinetic.PLoS Comput Biol. 2013;9(3):e1002935. doi: 10.1371/journal.pcbi.1002935. Epub 2013 Mar 14. PLoS Comput Biol. 2013. PMID: 23516341 Free PMC article.
-
Direct detection of DNA methylation during single-molecule, real-time sequencing.Nat Methods. 2010 Jun;7(6):461-5. doi: 10.1038/nmeth.1459. Epub 2010 May 9. Nat Methods. 2010. PMID: 20453866 Free PMC article.
-
Recent Advances in the Genomic Profiling of Bacterial Epigenetic Modifications.Biotechnol J. 2019 Jan;14(1):e1800001. doi: 10.1002/biot.201800001. Epub 2018 Jun 19. Biotechnol J. 2019. PMID: 29878585 Review.
-
Going beyond five bases in DNA sequencing.Curr Opin Struct Biol. 2012 Jun;22(3):251-61. doi: 10.1016/j.sbi.2012.04.002. Epub 2012 May 9. Curr Opin Struct Biol. 2012. PMID: 22575758 Review.
Cited by
-
Genome-wide mapping of methylated adenine residues in pathogenic Escherichia coli using single-molecule real-time sequencing.Nat Biotechnol. 2012 Dec;30(12):1232-9. doi: 10.1038/nbt.2432. Epub 2012 Nov 8. Nat Biotechnol. 2012. PMID: 23138224 Free PMC article.
-
No evidence for DNA N 6-methyladenine in mammals.Sci Adv. 2020 Mar 18;6(12):eaay3335. doi: 10.1126/sciadv.aay3335. eCollection 2020 Mar. Sci Adv. 2020. PMID: 32206710 Free PMC article.
-
Methodologies for detecting environmentally induced DNA damage and repair.Environ Mol Mutagen. 2020 Aug;61(7):664-679. doi: 10.1002/em.22365. Epub 2020 Feb 29. Environ Mol Mutagen. 2020. PMID: 32083352 Free PMC article. Review.
-
Enhanced 5-methylcytosine detection in single-molecule, real-time sequencing via Tet1 oxidation.BMC Biol. 2013 Jan 22;11:4. doi: 10.1186/1741-7007-11-4. BMC Biol. 2013. PMID: 23339471 Free PMC article.
-
First Comparative Analysis of Clostridium septicum Genomes Provides Insights Into the Taxonomy, Species Genetic Diversity, and Virulence Related to Gas Gangrene.Front Microbiol. 2021 Dec 9;12:771945. doi: 10.3389/fmicb.2021.771945. eCollection 2021. Front Microbiol. 2021. PMID: 34956133 Free PMC article.
References
Publication types
MeSH terms
Substances
Grants and funding
LinkOut - more resources
Full Text Sources
Other Literature Sources