Automated discovery of tissue-targeting enhancers and transcription factors from binding motif and gene function data

PLoS Comput Biol. 2014 Jan 30;10(1):e1003449. doi: 10.1371/journal.pcbi.1003449. eCollection 2014 Jan.

Abstract

Identifying enhancers regulating gene expression remains an important and challenging task. While recent sequencing-based methods provide epigenomic characteristics that correlate well with enhancer activity, it remains onerous to comprehensively identify all enhancers across development. Here we introduce a computational framework to identify tissue-specific enhancers evolving under purifying selection. First, we incorporate high-confidence binding site predictions with target gene functional enrichment analysis to identify transcription factors (TFs) likely functioning in a particular context. We then search the genome for clusters of binding sites for these TFs, overcoming previous constraints associated with biased manual curation of TFs or enhancers. Applying our method to the placenta, we find 33 known and implicate 17 novel TFs in placental function, and discover 2,216 putative placenta enhancers. Using luciferase reporter assays, 31/36 (86%) tested candidates drive activity in placental cells. Our predictions agree well with recent epigenomic data in human and mouse, yet over half our loci, including 7/8 (87%) tested regions, are novel. Finally, we establish that our method is generalizable by applying it to 5 additional tissues: heart, pancreas, blood vessel, bone marrow, and liver.

Publication types

  • Research Support, Non-U.S. Gov't

MeSH terms

  • Algorithms
  • Amino Acid Motifs
  • Animals
  • Automation
  • Binding Sites
  • Cluster Analysis
  • Computational Biology
  • Computer Simulation
  • Enhancer Elements, Genetic*
  • Epigenomics
  • Female
  • Gene Expression Profiling
  • Gene Expression Regulation
  • Humans
  • Mice
  • Placenta / physiology
  • Pregnancy
  • Transcription Factors / metabolism*
  • Trophoblasts / cytology

Substances

  • Transcription Factors

Grant support

This work was supported by the A. P. Giannini Foundation Postdoctoral Research Fellowship to GT, Bio-X Stanford Interdisciplinary Graduate Fellowship to AMW, a BioX IIP award and a Burroughs Wellcome Preterm Disease Planning grant to GB. GB is a Packard Fellow and a Microsoft Faculty Fellow. The funders had no role in study design, data collection and analysis, decision to publish, or preparation of the manuscript.