Chromatin accessibility is essential for transcriptional activation of genomic regions. It is well established that transcription factors (TFs) and histone modifications (HMs) play critical roles in chromatin accessibility regulation. However, there is a lack of studies that quantify these relationships. Here we constructed a two-layer model to predict chromatin accessibility by integrating DNA sequence, TF binding, and HM signals. By applying the model to two human cell lines (GM12878 and HepG2), we found that DNA sequences had limited power for accessibility prediction, while both TF binding and HM signals predicted chromatin accessibility with high accuracy. According to the HM model, HM features determined chromatin accessibility in a cell line shared manner, with the prediction power attributing to five core HM types. Results from the TF model indicated that chromatin accessibility was determined by a subset of informative TFs including both cell line-specific and generic TFs. The combined model of both TF and HM signals did not further improve the prediction accuracy, indicating that they provide redundant information in terms of chromatin accessibility prediction. The TFs and HM models can also distinguish the chromatin accessibility of proximal versus distal transcription start sites with high accuracy.
Keywords: Chromatin accessibility; ENCODE; Histone modifications; Machine learning; Transcription factor.
© 2021. The Author(s).