A systems biology approach to transcription factor binding site prediction

PLoS One. 2010 Mar 26;5(3):e9878. doi: 10.1371/journal.pone.0009878.


Background: The elucidation of mammalian transcriptional regulatory networks holds great promise for both basic and translational research and remains one the greatest challenges to systems biology. Recent reverse engineering methods deduce regulatory interactions from large-scale mRNA expression profiles and cross-species conserved regulatory regions in DNA. Technical challenges faced by these methods include distinguishing between direct and indirect interactions, associating transcription regulators with predicted transcription factor binding sites (TFBSs), identifying non-linearly conserved binding sites across species, and providing realistic accuracy estimates.

Methodology/principal findings: We address these challenges by closely integrating proven methods for regulatory network reverse engineering from mRNA expression data, linearly and non-linearly conserved regulatory region discovery, and TFBS evaluation and discovery. Using an extensive test set of high-likelihood interactions, which we collected in order to provide realistic prediction-accuracy estimates, we show that a careful integration of these methods leads to significant improvements in prediction accuracy. To verify our methods, we biochemically validated TFBS predictions made for both transcription factors (TFs) and co-factors; we validated binding site predictions made using a known E2F1 DNA-binding motif on E2F1 predicted promoter targets, known E2F1 and JUND motifs on JUND predicted promoter targets, and a de novo discovered motif for BCL6 on BCL6 predicted promoter targets. Finally, to demonstrate accuracy of prediction using an external dataset, we showed that sites matching predicted motifs for ZNF263 are significantly enriched in recent ZNF263 ChIP-seq data.

Conclusions/significance: Using an integrative framework, we were able to address technical challenges faced by state of the art network reverse engineering methods, leading to significant improvement in direct-interaction detection and TFBS-discovery accuracy. We estimated the accuracy of our framework on a human B-cell specific test set, which may help guide future methodological development.

Publication types

  • Research Support, N.I.H., Extramural

MeSH terms

  • Algorithms
  • Amino Acid Motifs
  • Binding Sites
  • Cell Line
  • Chromatin Immunoprecipitation
  • Cluster Analysis
  • Computational Biology / methods
  • Exons
  • Gene Expression Profiling
  • Humans
  • Oligonucleotide Array Sequence Analysis
  • RNA, Messenger / metabolism
  • Reproducibility of Results
  • Species Specificity
  • Systems Biology*
  • Transcription Factors / chemistry*


  • RNA, Messenger
  • Transcription Factors