Long reads capture simultaneous enhancer-promoter methylation status for cell-type deconvolution

Bioinformatics. 2021 Jul 12;37(Suppl_1):i327-i333. doi: 10.1093/bioinformatics/btab306.


Motivation: While promoter methylation is associated with reinforcing fundamental tissue identities, the methylation status of distant enhancers was shown by genome-wide association studies to be a powerful determinant of cell-state and cancer. With recent availability of long reads that report on the methylation status of enhancer-promoter pairs on the same molecule, we hypothesized that probing these pairs on the single-molecule level may serve the basis for detection of rare cancerous transformations in a given cell population. We explore various analysis approaches for deconvolving cell-type mixtures based on their genome-wide enhancer-promoter methylation profiles.

Results: To evaluate our hypothesis we examine long-read optical methylome data for the GM12878 cell line and myoblast cell lines from two donors. We identified over 100 000 enhancer-promoter pairs that co-exist on at least 30 individual DNA molecules. We developed a detailed methodology for mixture deconvolution and applied it to estimate the proportional cell compositions in synthetic mixtures. Analysis of promoter methylation, as well as enhancer-promoter pairwise methylation, resulted in very accurate estimates. In addition, we show that pairwise methylation analysis can be generalized from deconvolving different cell types to subtle scenarios where one wishes to resolve different cell populations of the same cell-type.

Availability and implementation: The code used in this work to analyze single-molecule Bionano Genomics optical maps is available via the GitHub repository https://github.com/ebensteinLab/Single_molecule_methylation_in_EP.

Publication types

  • Research Support, Non-U.S. Gov't

MeSH terms

  • Cell Line
  • DNA Methylation*
  • Enhancer Elements, Genetic
  • Genome-Wide Association Study*
  • Genomics
  • Humans
  • Promoter Regions, Genetic*
  • Regulatory Sequences, Nucleic Acid*