Skip to main page content
Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
. 2011 Apr;23(4):1293-306.
doi: 10.1105/tpc.111.083329. Epub 2011 Apr 22.

Prediction of Regulatory Interactions From Genome Sequences Using a Biophysical Model for the Arabidopsis LEAFY Transcription Factor

Affiliations
Free PMC article

Prediction of Regulatory Interactions From Genome Sequences Using a Biophysical Model for the Arabidopsis LEAFY Transcription Factor

Edwige Moyroud et al. Plant Cell. .
Free PMC article

Abstract

Despite great advances in sequencing technologies, generating functional information for nonmodel organisms remains a challenge. One solution lies in an improved ability to predict genetic circuits based on primary DNA sequence in combination with detailed knowledge of regulatory proteins that have been characterized in model species. Here, we focus on the LEAFY (LFY) transcription factor, a conserved master regulator of floral development. Starting with biochemical and structural information, we built a biophysical model describing LFY DNA binding specificity in vitro that accurately predicts in vivo LFY binding sites in the Arabidopsis thaliana genome. Applying the model to other plant species, we could follow the evolution of the regulatory relationship between LFY and the AGAMOUS (AG) subfamily of MADS box genes and show that this link predates the divergence between monocots and eudicots. Remarkably, our model succeeds in detecting the connection between LFY and AG homologs despite extensive variation in binding sites. This demonstrates that the cis-element fluidity recently observed in animals also exists in plants, but the challenges it poses can be overcome with predictions grounded in a biophysical model. Therefore, our work opens new avenues to deduce the structure of regulatory networks from mere inspection of genomic sequences.

Figures

Figure 1.
Figure 1.
Optimization of the LFY Binding Site Model. (A) Enrichment of DNA sequences bound by LFY over different Selex cycles. (B) Binding of LFY to different sequences, either from AG or AP1 genes, or synthetic (S), with varying numbers of mismatches to the previously recognized consensus LFY binding motif. (C) to (E) Comparison of experimentally determined and predicted scores (see Methods) for different DNA sequences with the three PSSMs (asymmetric [ASY], symmetric [SYM], and symmetric with triplets [SYM-T]), illustrated below by their logos. Open and closed circles represent sequences with or without the CCANTG[G/T] consensus, respectively. [See online article for color version of this figure.]
Figure 2.
Figure 2.
Detection of Dependence between Positions of the LFY Binding Sites. Alignment of the 494 Selex sequences was analyzed with enoLOGOS software (Workman et al., 2005). The mutual information of each pair of positions of the alignment is displayed as a gray-scale-coded matrix plot below the logo corresponding to the SYM PSSM. Dependence is detected between positions 4, 5, and 6 or 14, 15, and 16 (lateral triplets) and, to a lesser extent, between positions 9, 10, and 11 (central triplet). [See online article for color version of this figure.]
Figure 3.
Figure 3.
Comparison of the Different Models for Prediction of in Vivo LFY Binding Sites. ROC curves for LFY-bound and unbound sequences, using a biophysical model taking all sites (black line) into account or only those with a SYM-T matrix score higher than −23 (gray line).
Figure 4.
Figure 4.
Examples of LFY-Bound Regions Identified by ChIP-seq. Noncoding and coding sequences in exons are shown on top as open and closed boxes, respectively. ChIP-seq read coverage combined from both strands is shown in the middle. The bottom panels show the scores of binding sites (computed with the SYM-T model) and the presence of the CCANTG[G/T] consensus (indicated by arrows). AP1 (A), TFL1 (B), AG (C), and SEP4 (D). [See online article for color version of this figure.]
Figure 5.
Figure 5.
Prediction of LFY Occupancy of the Large Intron of AG Homologs Using the SYM-T Model. (A) Schematic phylogeny of AG homologs after Kramer et al. (2004). (B) and (C) POcc of AG homologs in monocots (B) and eudicots (C). A star indicates gene expression during early floral stages, and a circle indicates later expression. Expression data come from the references listed in Supplemental Table 2 online.
Figure 6.
Figure 6.
Distribution of LFY Binding Sites in AG-Like Genes. LFY binding sites with a score higher than −20 are shown in eudicots (PLENA and euAG lineages) and monocots (AG lineage). The score scale is shown in each panel; the best binding sites correspond to the less negative score values. Stars mark the LFY binding site AG2, which can be located with confidence in most introns thanks to a nearby conserved sequence (see Supplemental Figure 2 online). Gene and species names are indicated on the right. [See online article for color version of this figure.]

Similar articles

See all similar articles

Cited by 47 articles

See all "Cited by" articles

Publication types

MeSH terms

LinkOut - more resources

Feedback