Evaluating thermodynamic models of enhancer activity on cellular resolution gene expression data

Methods. 2013 Jul 15;62(1):79-90. doi: 10.1016/j.ymeth.2013.03.005. Epub 2013 Apr 26.


With the advent of high throughput sequencing and high resolution transcriptomic technologies, there exists today an unprecedented opportunity to understand gene regulation at a quantitative level. State of the art models of the relationship between regulatory sequence and gene expression have shown great promise, but also suffer from some major shortcomings. In this paper, we identify and address methodological challenges pertaining to quantitative modeling of gene expression from sequence, and test our models on the anterior-posterior patterning system in the Drosophila embryo. We first develop a framework to process cellular resolution three-dimensional gene expression data from the Drosophila embryo and create data sets on which quantitative models can be trained. Next we propose a new score, called 'weighted pattern generating potential' (w-PGP), to evaluate model predictions, and show its advantages over the two most common scoring schemes in use today. The model building exercise uses w-PGP as the evaluation score and adopts a systematic strategy to increase a model's complexity while guarding against over-fitting. Our model identifies three transcription factors--ZELDA, SLOPPY-PAIRED, and NUBBIN--that have not been previously incorporated in quantitative models of this system, as having significant regulatory influence. Finally, we show how fitting quantitative models on data sets comprising a handful of enhancers, as reported in earlier work, may lead to unreliable models.

Keywords: Cellular resolution data; Drosophila A/P patterning system; Enhancer; Quantitative model; Transcription factor; Transcriptional regulation.

Publication types

  • Research Support, N.I.H., Extramural
  • Research Support, U.S. Gov't, Non-P.H.S.

MeSH terms

  • Animals
  • Body Patterning / genetics
  • Cell Nucleus / genetics
  • Cell Nucleus / metabolism
  • Cell Nucleus / ultrastructure
  • Drosophila Proteins / genetics*
  • Drosophila Proteins / metabolism
  • Drosophila melanogaster / embryology
  • Drosophila melanogaster / genetics*
  • Drosophila melanogaster / metabolism
  • Embryo, Nonmammalian / cytology
  • Embryo, Nonmammalian / metabolism*
  • Embryo, Nonmammalian / ultrastructure
  • Gene Expression Profiling
  • Gene Expression Regulation, Developmental*
  • Homeodomain Proteins / genetics*
  • Homeodomain Proteins / metabolism
  • Image Processing, Computer-Assisted / statistics & numerical data
  • Models, Genetic*
  • Nuclear Proteins
  • POU Domain Factors / genetics*
  • POU Domain Factors / metabolism
  • Thermodynamics
  • Transcription Factors / genetics*
  • Transcription Factors / metabolism
  • Transcription, Genetic


  • Drosophila Proteins
  • Homeodomain Proteins
  • Nuclear Proteins
  • POU Domain Factors
  • Transcription Factors
  • nub protein, Drosophila
  • slp1 protein, Drosophila
  • zld protein, Drosophila