Three-dimensional modeling of chromatin structure from interaction frequency data using Markov chain Monte Carlo sampling

BMC Bioinformatics. 2011 Oct 25:12:414. doi: 10.1186/1471-2105-12-414.

Abstract

Background: Long-range interactions between regulatory DNA elements such as enhancers, insulators and promoters play an important role in regulating transcription. As chromatin contacts have been found throughout the human genome and in different cell types, spatial transcriptional control is now viewed as a general mechanism of gene expression regulation. Chromosome Conformation Capture Carbon Copy (5C) and its variant Hi-C are techniques used to measure the interaction frequency (IF) between specific regions of the genome. Our goal is to use the IF data generated by these experiments to computationally model and analyze three-dimensional chromatin organization.

Results: We formulate a probabilistic model linking 5C/Hi-C data to physical distances and describe a Markov chain Monte Carlo (MCMC) approach called MCMC5C to generate a representative sample from the posterior distribution over structures from IF data. Structures produced from parallel MCMC runs on the same dataset demonstrate that our MCMC method mixes quickly and is able to sample from the posterior distribution of structures and find subclasses of structures. Structural properties (base looping, condensation, and local density) were defined and their distribution measured across the ensembles of structures generated. We applied these methods to a biological model of human myelomonocyte cellular differentiation and identified distinct chromatin conformation signatures (CCSs) corresponding to each of the cellular states. We also demonstrate the ability of our method to run on Hi-C data and produce a model of human chromosome 14 at 1Mb resolution that is consistent with previously observed structural properties as measured by 3D-FISH.

Conclusions: We believe that tools like MCMC5C are essential for the reliable analysis of data from the 3C-derived techniques such as 5C and Hi-C. By integrating complex, high-dimensional and noisy datasets into an easy to interpret ensemble of three-dimensional conformations, MCMC5C allows researchers to reliably interpret the result of their assay and contrast conformations under different conditions.

Availability: http://Dostielab.biochem.mcgill.ca.

Publication types

  • Research Support, Non-U.S. Gov't

MeSH terms

  • Cell Line, Tumor
  • Chromatin / chemistry*
  • Chromosomes, Human, Pair 14
  • Computer Simulation
  • Genome, Human*
  • Homeodomain Proteins / genetics
  • Homeodomain Proteins / metabolism
  • Humans
  • Markov Chains
  • Models, Biological*
  • Monte Carlo Method
  • Regulatory Sequences, Nucleic Acid*

Substances

  • Chromatin
  • Homeodomain Proteins