BinDNase: a discriminatory approach for transcription factor binding prediction using DNase I hypersensitivity data

Bioinformatics. 2015 Sep 1;31(17):2852-9. doi: 10.1093/bioinformatics/btv294. Epub 2015 May 7.

Abstract

Motivation: Transcription factors (TFs) are a class of DNA-binding proteins that have a central role in regulating gene expression. To reveal mechanisms of transcriptional regulation, a number of computational tools have been proposed for predicting TF-DNA interaction sites. Recent studies have shown that genome-wide sequencing data on open chromatin sites from a DNase I hypersensitivity experiments (DNase-seq) has a great potential to map putative binding sites of all transcription factors in a single experiment. Thus, computational methods for analysing DNase-seq to accurately map TF-DNA interaction sites are highly needed.

Results: Here, we introduce a novel discriminative algorithm, BinDNase, for predicting TF-DNA interaction sites using DNase-seq data. BinDNase implements an efficient method for selecting and extracting informative features from DNase I signal for each TF, either at single nucleotide resolution or for larger regions. The method is applied to 57 transcription factors in cell line K562 and 31 transcription factors in cell line HepG2 using data from the ENCODE project. First, we show that BinDNase compares favourably to other supervised and unsupervised methods developed for TF-DNA interaction prediction using DNase-seq data. We demonstrate the importance to model each TF with a separate prediction model, reflecting TF-specific DNA accessibility around the TF-DNA interaction site. We also show that a highly standardised DNase-seq data (pre)processing is a requisite for accurate TF binding predictions and that sequencing depth has on average only a moderate effect on prediction accuracy. Finally, BinDNase's binding predictions generalise to other cell types, thus making BinDNase a versatile tool for accurate TF binding prediction.

Availability and implementation: R implementation of the algorithm is available in: http://research.ics.aalto.fi/csb/software/bindnase/.

Contact: juhani.kahara@aalto.fi

Supplementary information: Supplemental data are available at Bioinformatics online.

Publication types

  • Research Support, Non-U.S. Gov't

MeSH terms

  • Algorithms
  • Binding Sites
  • Chromatin / genetics
  • Chromatin Immunoprecipitation
  • DNA / metabolism*
  • DNA Footprinting / methods*
  • Deoxyribonuclease I / metabolism
  • Gene Expression Regulation*
  • Genomics / methods
  • Hep G2 Cells
  • High-Throughput Nucleotide Sequencing / methods*
  • Humans
  • K562 Cells
  • Protein Binding
  • Sequence Analysis, DNA / methods*
  • Software*
  • Transcription Factors / metabolism*

Substances

  • Chromatin
  • Transcription Factors
  • DNA
  • Deoxyribonuclease I