Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
. 2010 Oct 20;99(8):2408-13.
doi: 10.1016/j.bpj.2010.08.006.

Accurate prediction of gene expression by integration of DNA sequence statistics with detailed modeling of transcription regulation

Affiliations

Accurate prediction of gene expression by integration of DNA sequence statistics with detailed modeling of transcription regulation

Jose M G Vilar. Biophys J. .

Abstract

Gene regulation involves a hierarchy of events that extend from specific protein-DNA interactions to the combinatorial assembly of nucleoprotein complexes. The effects of DNA sequence on these processes have typically been studied based either on its quantitative connection with single-domain binding free energies or on empirical rules that combine different DNA motifs to predict gene expression trends on a genomic scale. The middle-point approach that quantitatively bridges these two extremes, however, remains largely unexplored. Here, we provide an integrated approach to accurately predict gene expression from statistical sequence information in combination with detailed biophysical modeling of transcription regulation by multidomain binding on multiple DNA sites. For the regulation of the prototypical lac operon, this approach predicts within 0.3-fold accuracy transcriptional activity over a 10,000-fold range from DNA sequence statistics for different intracellular conditions.

PubMed Disclaimer

Figures

Figure 1
Figure 1
Integration of sequence statistics into predictive biophysical multidomain models. The approach is implemented by first considering the three operators as DNA signals. They are used to construct a probabilistic model that provides binding scores for these and similar mutated sequences. The scores are subsequently linked parametrically to binding free energies and incorporated directly into a detailed biophysical model of transcription regulation. The link between scores and free energies is calibrated by fitting the model to a subset of experimental data. The calibrated model is then tested with different sets of data.
Figure 2
Figure 2
Operator locations on DNA and binding of the lac repressor. (A) The main (O1) and two auxiliary (O2 and O3) operators are shown as black rectangles on the black line representing DNA. Binding of the lac repressor to O1 prevents transcription of the three lacZYA genes. (B) A repressor is shown bound to O2. The free energy of binding is ΔG = e2 + p. (C) A repressor is shown looping DNA by binding simultaneously to O1 and O3. The free energy of this binding configuration is ΔG = e1 + e3 + cL13 + p.
Figure 3
Figure 3
Factor graph for the free-energy components of the multisite lac repressor-operator binding. The free energy of the system, ΔG(s), as a function of the state variables, s = (s1, s2, s3, sL12, sL13, sL23), has a graphical representation in the form of a factor graph. The round nodes represent state variables and the rectangular nodes represent contributions to the free energy. The quantity in the rectangular node is present in the free energy when all its connecting state variables are equal to 1. The experimental values for wild-type parameters are e1 = −27.8 kcal/mol, e2 = −26.3 kcal/mol, e3 = −24.1 kcal/mol, cL12 = 23.35 kcal/mol, cL13 = 22.05 kcal/mol, and cL23 = 23.50 kcal/mol. The dependence on the lac repressor concentration, n, is given by the positional free energy, p = p° − RTlnn, with p° = 15 kcal/mol.
Figure 4
Figure 4
Model calibration and prediction of the transcriptional activity as a function of the repressor concentration. The normalized transcription (Γ¯/Γmax) was obtained for WT and seven mutants accounting for all the combinations of deletions of the three operators. For each of the eight cases, the results of the model (solid lines) as a function of the repressor concentration are compared with the experimental data from Oehler et al. (14) (squares). The particular set of WT or deleted operators is indicated for each curve; for instance, O1-O2-O3 corresponds to the WT lac operon and O1M-O2M-O3M to the mutant with all three operators deleted. The values of the experimental parameters used are cL12 = 23.35 kcal/mol, cL13 = 22.05 kcal/mol, cL23 = 23.50 kcal/mol, and χ = 0.03. The PWM scores, S, for each site are as shown in Table 1. (A) Parameter values a = 1.387 kcal/mol and b = −9.064 kcal/mol, which connect interaction free energies with scores, e = aS + b, were obtained by fitting the model to all the experimental transcription data. (B) Parameter values a = 1.348 kcal/mol and b = −9.531 kcal/mol were obtained by fitting the model to the experimental data for operator configurations O1-O2-O3 and O1M-O2-O3. The model accurately predicts the normalized transcription for the other six operator configurations. (C) Only two experimental points (large gray circles) are used to obtain the parameter values a = 1.462 kcal/mol and b = −8.208 kcal/mol. The model is still able to accurately predict the normalized transcription for the remaining 20 experimental points.
Figure 5
Figure 5
Complete deletions versus weak binding. The normalized transcription (Γ¯/Γmax) for the four configurations with O1M is shown for the model as in Fig. 4A (solid line); for the model assuming that the free energy of binding to O1M is infinite, as in a complete deletion (dashed line); and for the experimental data from Oehler et al. (14) (squares).

Similar articles

Cited by

References

    1. Jacob F., Monod J. Genetic regulatory mechanisms in the synthesis of proteins. J. Mol. Biol. 1961;3:318–356. - PubMed
    1. Wasserman W.W., Sandelin A. Applied bioinformatics for the identification of regulatory elements. Nat. Rev. Genet. 2004;5:276–287. - PubMed
    1. Stormo G.D. DNA binding sites: representation and discovery. Bioinformatics. 2000;16:16–23. - PubMed
    1. Zhao Y., Granas D., Stormo G.D. Inferring binding energies from selected binding sites. PLOS Comput. Biol. 2009;5:e1000590. - PMC - PubMed
    1. Tronche F., Ringeisen F., Pontoglio M. Analysis of the distribution of binding sites for a tissue-specific transcription factor in the vertebrate genome. J. Mol. Biol. 1997;266:231–245. - PubMed

Publication types

LinkOut - more resources