DELTA: A Distal Enhancer Locating Tool Based on AdaBoost Algorithm and Shape Features of Chromatin Modifications

PLoS One. 2015 Jun 19;10(6):e0130622. doi: 10.1371/journal.pone.0130622. eCollection 2015.

Abstract

Accurate identification of DNA regulatory elements becomes an urgent need in the post-genomic era. Recent genome-wide chromatin states mapping efforts revealed that DNA elements are associated with characteristic chromatin modification signatures, based on which several approaches have been developed to predict transcriptional enhancers. However, their practical application is limited by incomplete extraction of chromatin features and model inconsistency for predicting enhancers across different cell types. To address these issues, we define a set of non-redundant shape features of histone modifications, which shows high consistency across cell types and can greatly reduce the dimensionality of feature vectors. Integrating shape features with a machine-learning algorithm AdaBoost, we developed an enhancer predicting method, DELTA (Distal Enhancer Locating Tool based on AdaBoost). We show that DELTA significantly outperforms current enhancer prediction methods in prediction accuracy on different datasets and can predict enhancers in one cell type using models trained in other cell types without loss of accuracy. Overall, our study presents a novel framework for accurately identifying enhancers from epigenetic data across multiple cell types.

Publication types

  • Research Support, Non-U.S. Gov't

MeSH terms

  • Algorithms
  • CD4-Positive T-Lymphocytes / cytology
  • CD4-Positive T-Lymphocytes / metabolism
  • Cell Line
  • Chromatin / chemistry
  • Chromatin / metabolism*
  • Chromatin Immunoprecipitation
  • Enhancer Elements, Genetic
  • Histone Code
  • Histones / metabolism
  • Humans
  • Promoter Regions, Genetic
  • Software*

Substances

  • Chromatin
  • Histones

Grant support

This work was supported by the National Basic Research Project (973 program) (2012CB518200) (http://www.973.gov.cn/), the General Program (31401141, 30900830) of the Natural Science Foundation of China (www.nsfc.gov.cn), the State Key Laboratory of Proteomics of China (SKLP-Y201303, SKLP-O201104 and SKLP-K201004) (www.bprc.ac.cn), and the Special Key Programs for Science and Technology of China (2012ZX09102301-016).