Skip to main page content
Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
, 512 (7515), 400-5

Regulatory Analysis of the C. Elegans Genome With Spatiotemporal Resolution

Affiliations

Regulatory Analysis of the C. Elegans Genome With Spatiotemporal Resolution

Carlos L Araya et al. Nature.

Erratum in

Abstract

Discovering the structure and dynamics of transcriptional regulatory events in the genome with cellular and temporal resolution is crucial to understanding the regulatory underpinnings of development and disease. We determined the genomic distribution of binding sites for 92 transcription factors and regulatory proteins across multiple stages of Caenorhabditis elegans development by performing 241 ChIP-seq (chromatin immunoprecipitation followed by sequencing) experiments. Integration of regulatory binding and cellular-resolution expression data produced a spatiotemporally resolved metazoan transcription factor binding map. Using this map, we explore developmental regulatory circuits that encode combinatorial logic at the levels of co-binding and co-expression of transcription factors, characterizing the genomic coverage and clustering of regulatory binding, the binding preferences of, and biological processes regulated by, transcription factors, the global transcription factor co-associations and genomic subdomains that suggest shared patterns of regulation, and identifying key transcription factors and transcription factor co-associations for fate specification of individual lineages and cell types.

Conflict of interest statement

The authors declare no competing financial interests.

Figures

Extended Data Figure 1
Extended Data Figure 1
(a) ChIP-seq raw read data were processed using a uniform processing pipeline with identical alignment, filtering criteria, and standardized IDR binding site identification using SPP. (b) Comparison of conservative (replicate) and pooled (pseudo-replicate) binding site calls –from the cross-replicate and rescue thresholds, respectively. (c) Distribution of NSC scores across 323 ChIP-seq experiments. Experiments are classified as high- (blue, NHI=181), medium- (green, NMD=60), and low-quality (yellow, NLO=82), and the relative fractions of each is indicated in the inset. High and medium quality experiments were approved for downstream analysis. (d) The fraction of binding sites shared between duplicate, approved ChIP-seq experiments with (NU=22) unique factor and stage combinations is shown. The fraction shared between the best-overlapping pairs of experiments with matched factor, stage combinations is shown in the light blue distribution. The fraction shared among all duplicates experiments (NP=24) with matched factor, stage, and promoter-driven TF expression is shown in dark blue. The range of fractions shared between true biological duplicates (ND=2) with matched factor, stage, promoter, and ChIP protocol is indicated in dashed lines. For comparison, the fraction shared between randomly sampled pairs (NS=500) of approved experiments from distinct factors is shown in gray. The median fractions for each distribution are shown. (e) Binding site histogram for 187 embryo and larval ChIP-seq experiments with unique factor-stage combinations, and a common ChIP protocol, selected for analysis in this work. The fraction of high- (blue, NHI=138) and medium-quality (green, NMD=49) ChIP-seq experiments selected is indicated (inset). (f) Analysis of sequence preferences for 21 C. elegans factors (NO) with human ortholog binding data. The fraction of C. elegans factors (NM,=71.4%) for which sequence preferences could be determined is shown (left). The fraction of factors with conserved sequence preferences (66.7%, P < 0.05) from 12 human/worm orthologs with determined sequence preferences is shown (right). (g) The distribution in the fraction of binding sites with matches to the discovered preferred sequence (motif) is shown for 15 factors. The prevalence of the preferred sequence is evaluated among the top 200, 400, 600, 800, and 1000 binding sites for each factor (see Methods). (h) Discovered sequence preferences for 12 human/worm orthologs. Factors with similar (P < 0.05) and distinct sequence preferences are indicated in dark blue and light blue, respectively. The consensus sequence preference for the ONECUT3 homeobox factor was obtained from Jolma et al. (i) Saturation analysis of regulatory binding data. Using either binding data from embryonic and larval (L1-L4) stages or L2 larvae only (inset), k ChIP-seq experiments were randomly sampled (50 times each), collapsing overlapping binding sites into binding regions. For each k ChIP-seq experiments, the number of binding regions from 50 iterations is plotted (red points, ±1 S.D.). For each series, a exponential curve (blue, dashed line) was fit to the data and used to estimate the total number of binding regions. The percent of the binding regions (CBP) observed in the acquired data is reported for each series. (j) Amongst genes with annotated TSSs, the fraction of genes with binding observed within the specified window upstream of a TSS is shown. Promoter regions examined correspond to the windows (1) 1000/100 bp, (1) 2000/200 bp, (3) 3000/300 bp, (4) 4000/400 bp, and (5) 5000/500 bp upstream/downstream of the TSS, respectively.
Extended Data Figure 2
Extended Data Figure 2
Stage-dependent determination and analysis of HOT and XOT regions. (a) Correlations in occupancy (number of binding sites, x-axis) and density (number of binding sites per kb, y-axis) in embryo and larval L1-L4 binding regions. Quantiles for occupancy and density derived from binding site simulations are indicated on each axis. The fraction of binding regions (b) and the fraction of binding sites in regions (c) exceeding the significance cutoffs (quantiles from simulations) is indicated for both occupancy (yellow) and density (blue). Fractions exceeding cutoffs for both metrics are shown in red. Specific occupancy and density cutoffs for each significance level are indicated above each point. HOT (5% significance) and XOT (1% significance) regions exceed the specific occupancy thresholds indicated with arrows. (d) GO enrichment analysis of constitutive HOT (cHOT), embryo, and larval L1-L4 HOT regions. For each stage, the non-cHOT stage-derived HOT regions were analyzed. GO enrichments in stage-specific HOT regions are available in Supplementary Table 3. (e) The distribution of HOT region distances from annotated TSS in the C. elegans genome (ws220) is indicated for cHOT regions, non-constitutive HOT regions (non-cHOT), and stage-specific HOT regions. With the exception of larval L1-specific HOT regions, stage-specific HOT regions tend to be more distal. The overlap of HOT regions with embryonic (f) and larval L3 (g) chromatin states is indicated for cHOT, stage-derived HOT regions, and stage-specific HOT regions. With the exception of larval L1-specific HOT regions, cHOT regions show stronger promoter-associated chromatin states than non-constitutive HOT regions.
Extended Data Figure 3
Extended Data Figure 3
(a) Chromatin state distribution (y-axis) of embryonic binding regions as a function of binding region occupancy (x-axis). Embryonic binding regions with occupancies spanning 1–20 were mapped to 16 hiHMM chromatin states discovered in embryos. Regulator binding regions (RGB), HOT region, and XOT region occupancy levels are indicated along the x-axis as blue, yellow, and red bars, respectively. Chromatin state identities are indicated underneath. (b, c) Fold change in frequency of chromatin states as a function of occupancy in embryos (b) and in L3 larvae (c). HOT and XOT cutoffs for each stage are indicated in dashed lines. (d, e) Chromatin state distribution of factor binding in embryonic and larval L3 stages. Embryonic (d) and larval L3 (e) binding sites from individual ChIP-seq experiments were mapped to chromatin states derived from embryos and L3 larvae, respectively. (f) Signal densities near enzymatically-derived TSSs. The log2-ratio of upstream (red) versus downstream (blue) binding is color-coded below. Factors discussed in the text are highlighted.
Extended Data Figure 4
Extended Data Figure 4
(a) Gene ontology (GO) enrichment matrix for 150 binding experiments (75 factors) spanning 6,347 significant GO enrichments (BH-corrected, P < 0.05) across 713 GO terms (level ≥4). For each experiment, GO-term enrichment was performed on gene targets as defined by binding within 1 kb of TSSs (ChipPeakAnno) . Enrichments for biological process (BP) and molecular function (MF) ontologies are shown, with distinct sets of enrichments highlighted (i–viii). (b) GO term enrichments among targets of UNC-62 binding show dramatical changes in the functional role of UNC-62 regulatory activity through development. Biological process (BP) terms (levels ≥4) enriched in UNC-62 libraries are shown. The number of UNC-62 binding sites identified per stage is indicated in parenthesis. Although changes in targets between mid-larval and adult stages have been suggested previously, our analyses (performed with uniformly called binding sites) and expanded data indicate that the most dramatic changes occur between embryo and L4 larval stages. (+) MEP-1 indicates experiments performed in strain OP102.
Extended Data Figure 5
Extended Data Figure 5
(a) Clustering patterns in pairwise TF co-associations. Clustered libraries from shared factors are colored blue. Clustered embryonic libraries are colored yellow. ChIP-seq libraries that cluster in embryonic groups and with distinct stages for the same factor are colored green. BLMP-1 and ELT-3 libraries are colored purple. FOS-1 and JUN-1 libraries are colored red. All other libraries are colored gray in the dendrogram. The clustering dendrogram is derived from Fig. 2a. (b) Difference in pairwise TF co-associations at expressed and repressed promoter domains. For embryonic and larval L1 stages, we computed co-association strength 2kb upstream and 200bp downstream domains of TSSs associated with expressed and repressed genes, from stage-specific binding experiments with IntervalStats. For each comparison (and each domain), the difference in the strength of co-associations between the expressed and repressed domains is shown for embryo (bottom left) and larval L1 stages (top right). Positive values indicate stronger co-associations in the expressed domain whereas negative values indicate stronger co-associations in the domain of repressed promoters. (c–f) Change in pairwise TF co-associations across sequential developmental stages. For factors assayed in sequential developmental stages, the difference in the co-association strengths for pairs of factors is shown. The change in co-association strengths are shown for the embryo to larval L1 (c), larval L1 to L2 (d), larval L2 to L3 (e), and larval L3 to L4 transitions (f). Co-association strengths for pairs of factors at each stage are derived from Fig. 2a.
Extended Data Figure 6
Extended Data Figure 6
Stage-specific analysis of higher-order co-associations in the larvae. For each larval stage of development, binding regions were annotated with binary signatures indicating the presence or absence of factor binding and clustered into SOMs describing the co-association patterns amongst factors assayed in each stage. SOMs (a–d) are colored by number of factors per co-association pattern with respective patterns in each cluster are indicated underneath. (e) For each co-association pattern discovered in stage-specific SOMs, GO enrichment analysis was performed on genes associated by binding within 1 kb of TSSs (ChipPeakAnno) . GO terms are arranged along the circumference of the graph, and their enrichment is indicated in each stage. The inner-most layer contains the gene ontology color key as indicated and subsequent layers (from the center) indicate embryonic (EX), L1, L2, L3, and L4 enrichment of each GO term. For visualization purposes, only GO terms with 5 ≤ annotated genes ≤25 (NGO= 419) are shown.
Extended Data Figure 7
Extended Data Figure 7
Stage-comparison SOMs highlight patterns in the specificity of higher-order TF co-associations. (a) Abundance of co-association patterns is graphed as function of the number of factors in each co-association in stage-comparison SOMs for the embryo versus larval L1 stage comparison. Similar patterns are observed in all stage-comparisons SOMs. (b) Difference in binding sites between embryos and L1 larvae for each factor (gray dots). The fractional difference, calculated as fraction of the larger set of binding sites represented by the difference in binding sites, is shown. Factors are rank-ordered by their difference in binding sites. The fraction of co-association patterns that are stage-specific (≥90% embryonic or larval L1) in SOMs is indicated for the raw binding sites with all factors (Fig. 3a, dashed line), in SOMs with individual factors removed (blue), and in SOMs with factors sequentially removed (red). (c) Embryonic and larval L1 binding SOM with matched numbers of binding sites. Briefly, binding data for the 15 factors assayed in the embryo and L1 larvae was sub-sampled to generate stage-specific binding modules with equal numbers of binding sites for each factor (see Methods). Stage-specific binding modules with matched binding sites were clustered in an SOM describing 140 co-association patterns. SOM is colored as in (Fig. 3a). (d) Binding signatures (fraction of modules bound by each factor) are shown for each co-association pattern from (c). Sidebar indicates the embryonic (versus L1) stage-specificity of each co-association pattern as in (c). Stage-comparison SOMs with raw and matched binding sites are presented for the (e) larval L1 versus L2 comparison, (f) larval L2 versus L3 comparison, and (g) larval L3 versus L4 comparison. Binding region comparisons are performed as in Fig. 3. Briefly, binding data for factors assayed in sequential stages are assigned to stage-resolved binding modules (i.e. L1:I:10001174-10001734). Stage-resolved binding modules are clustered into SOMs describing shared and stage-specific co-association patterns. SOMs are colored by the T1 versus T2 (for example, L1 versus L2) stage-specificity of the learned co-association patterns, measured as the fraction of binding modules that are T1. T1- and T2-specific co-association patterns are shown in red and blue, respectively. Sidebars indicate the T1 (versus T2) stage-specificity of each co-association pattern. As in Fig. 3, SOMs with matched binding sites were generated by sub-sampling binding sites to generate stage-resolved binding modules with equal numbers of binding sites for each factor. For each comparison, the most representative sampling (from 100 iterations) was selected to seed SOM analyses. For each of the stage-comparison SOMs with matched binding sites (e–g), the matrix of learned co-association patterns (fraction of modules bound by each factor) are shown below each SOM. (h–j) The fraction of co-association patterns that are stage-specific (≥90% either stage) in SOMs is indicated for the raw binding sites with all factors assayed in both stages (dashed line), in SOMs with individual factors removed (blue), and in SOMs with factors sequentially removed (red) are shown for the larval L1 and L2 stage (h), larval L2 and L3 stage (i), and larval L3 and L4 stage (j) comparisons.
Extended Data Figure 8
Extended Data Figure 8
(a) Cellular-resolution, protein expression levels for 180 genes (x-axis) in terminal embryo cells (N=671, y-axis). For each gene, the normalized expression signal in each cell is shown (see Methods). For each gene, expression signals in cells not measured directly corresponds to the expression signal of the last measured ancestor. Focus factors (FF=13) whose binding was assayed in embryonic stages are labeled red. Factors whose binding was assayed only in larval stages are labeled blue (FL=23). The broad tissue class of each cell is indicated in the sidebar. (b) Embryonic, cellular-resolution expression data quality controls. The number of time-series recorded per gene (x-axis) is shown. For genes with multiple time-series (NGR=145), the Pearson correlation coefficient (R) in the fluorescence signals of cells recorded was calculated between NPR=762 pairs of time-series (replicates). The distribution of correlation coefficients is shown. The median correlation co-efficient among replicate experiments is shown (R = 0.8310). The number (c) and percentage (d) of embryonic cells with expression measurements across any of the assayed genes (assayed cells, gray), all of the assayed genes (tracked cells), and all of the 13 genes (focus factors) for which both embryonic binding data and cellular-resolution expression data was acquired (focused cells) are plotted as a function of developmental time (Sulston minutes). The specific developmental times with the maximum coverage of the cells in the embryo are indicated for the tracked (TT) and focused cells (TF). (e) Previous reports have suggested that a robust heuristic to identify cells in which individual genes are expressed can be obtained by requiring a fluorescence signal ≥ 2000 and a fluorescence signal that is ≥ 10% of the maximum signal observed for each reporter (gene). To confirm these recommendations, we calculated the overlap in the expressing cell populations for pairs of genes at 10% (f=0.1) and 20% (f =0.2) of the maximal signal for each gene, and computed the correlation between calculated overlaps per gene-pair between the two thresholds (R=0.94). This analysis was extended to compare a wide range of expression cutoffs (f) in (e), where we observed robust correlations for the 10% cutoff (f =0.1). (f) Cellular expression overlap matrix for 180 genes in the early embryo. For each pairwise gene comparison, we calculated the significance of the overlap between the population of cells expressing each gene. The overlap enrichment and depletion P-values between gene pairs were determined using directional Fisher’s exact tests and were Benjamini-Hochberg corrected. To generate a final overlap score, we select the most significant of the enrichment and depletion scores, reporting either the -log10(P-value of enrichment) or the log10(P-value of depletion) to obtain positive and negative values for enrichment and depletion, respectively. (g) Overlap between co-association cells and the gene-expressing cells (the expressing population) for non-focus factors (NNF=168). For each cellular-resolution co-association pattern discovered (Fig. 4c), the set of co-association cells is defined as the population of cells in which the co-association is observed in the SOM. For 39 co-association patterns, co-association cells significantly overlap (hypergeometric test, Bonferroni-corrected, P < 0.01) the gene-expression cells of at least one of 124 non-focus factor target genes. Co-association patterns and target gene pairs with significant overlaps between the co-association cells and gene-expression cells were classified as ‘Co-association in promoter’ if the co-association pattern with the significant enrichment was observed at the promoter at the target gene, and as ‘Co-association not in promoter’ if this was not the case. The distribution of overlap significance values for the two classes and the respective Wilcoxon test P-value for similarity between the two distributions is shown. MEP-1 (+) indicates experiments performed with strain OP102.
Extended Data Figure 9
Extended Data Figure 9
Full-resolution view of global pairwise TF co-association matrix. As outlined in Fig. 2a, the significance of co-binding (co-association strength) 2kb upstream and 200bp downstream of TSSs was measured reciprocally between all binding experiments (IntervalStats, see Methods). For each comparison (NC=34,782), the fraction of significant (P < 0.05) co-binding events was computed and the mean fraction of reciprocal tests is reported (NT=17,391). Co-association scores are scaled by the standard deviation (uncentered) for visualization purposes. Co-associations were examined among 292,466 binding sites outside of XOT regions. Inset (i) shows the distribution of global TF co-association strengths from pairwise comparisons of 187 ChIP-seq experiments. The distribution of co-association strengths is shown from comparisons of all (distinct) ChIP-seq experiments (NDE=17,391, light blue) and from comparisons of ChIP-seq experiments from distinct factors (NDF=17,197, dark blue). The 75th, 90th, and 95th percentiles from comparisons between distinct factors (CS75%=0.2437, CS90%=0.3589, and CS95%=0.4266) are indicated as light red, red, and dark red dashed-lines, respectively. Co-association strengths between FOS-1:JUN-1 in L1, L3 and L4 larvae are indicated with arrows. Inset (ii) highlights the similarity (Wilcoxon test, P=0.4913) between distributions from distinct factors and distinct experiments.
Extended Data Figure 10
Extended Data Figure 10
Representative samples of staged, transgenic C. elegans embryos and larvae expressing GFP-tagged fusion proteins. GFP fluorescence images, DIC images, and merged (GFP/DIC) images are labeled with green, white, and blue dots, respectively. The 10 μm scale bar is shown in GFP fluorescence images. Images were selected independent of binding experiment results. Approved binding experiments include: MEP-1 (mixed embryo, L2 larvae), DPL-1 (L1 larvae), C27D6.4 (L2 larvae), NHR-23 (L3 larvae), and CEH-16 (L4 larvae) experiments.
Figure 1
Figure 1
Large-scale regulatory analysis of the C. elegans genome. (a) Factors assayed per developmental stage (or treatment) in 241 quality-filtered ChIP-seq experiments. Stages and treatments are abbreviated as Early Embryo (EE), Late Embryo (LE), Embryo Mixed (EM; EE and LE), larval L1 (L1), larval L2 (L2), larval L3 (L3), larval L4 (L4), Young Adult (YA), mixed Larval and Young Adults (LY), Day 4 Adult (D4), and Starved L1 (S1). Embryonic datasets were combined into a compiled embryonic stage (EX). Analyses in this report focus on embryonic (yellow) and larval (blue) experiments (NA=187). (b) Genomic coverage (percent of genomic bases) of regulatory binding (excluding RNA polymerases) in 181 C. elegans (outer circle) and 339 H. sapiens (inner circle) ChIP-seq experiments. Genomic coverage of cHOT, HOT, and other regulatory binding (RGB) regions are highlighted in red, yellow, and blue, respectively. cXOT and XOT percentages are shown in parenthesis. cHOT, HOT and RGB region coverage in the human genome are 0.17%, 1.4%, and 6.1%, respectively. (c) Cutoff-normalized, occupancy levels in 126 embryo-specific (yellow) and 91 larval L4-specific (blue) HOT regions. Error bars indicate the 25th and 75th percentiles. (d) Chromatin state distribution of L3 larvae binding regions by occupancy. RGB, HOT, and XOT region occupancy levels are indicated along the x-axis as blue, yellow, and red bars, respectively. (e, f) Signal densities near enzymatically-derived TSSs for BLMP-1 and ALY-2, and RNA Pol II. (g) Functional (GO term) enrichment for gene targets of binding. A subset of biological process (BP) terms (levels ≥ 4) are shown for factors enriched (BH-corrected, P < 0.01) in synaptic transmission; Early MEP-1 and DPL-1 data sets are included for comparison. (h) Example signal tracks near the UNC-104 locus.
Figure 2
Figure 2
Global and domain-specific patterns of TF co-association. (a) Global pairwise TF co-association matrix (NT=17,391) as defined by promoter interval statistics. Co-association scores are scaled by the standard deviation (uncentered) for visualization purposes. Co-associations of interest and discussed in the text are highlighted. LX indicates larval stages L1-L4. A higher-resolution version is available in Extended Data Fig. 9. CES-1:FKH-10 co-associations are highlighted in inset (i). Co-association strengths (unscaled) between early embryo and later stages are shown in inset (ii) for RNA Pol II-specific binding (blue), and for all factor-specific binding (light-blue). (b) Embryonic (EX) binding regions (NR=6,555) were clustered into a SOM describing 240 co-association patterns among 26 factors. (c) Binding signatures (fraction of modules bound by each factor) of the learned co-association patterns are shown. The relative number of factors per co-association pattern, expression from overlapping promoters, distance to TSSs, and number of modules with each co-association pattern are indicated as a fraction of the maximum observed across co-association patterns. (d) Functional enrichment for regions with UNC-62-bound co-association patterns of the embryo SOM.
Figure 3
Figure 3
Stage-specificity in higher-order TF co-associations. (a) Embryonic (EX) and larval L1 binding SOM with raw binding sites. Binding data for factors (NF=15) assayed in embryos and L1 larvae was assigned to stage-specific binding modules (NM=25,261) as diagramed in the inset. Stage-specific binding modules were clustered into an SOM describing 192 co-association patterns. The SOM is colored by the embryonic (versus L1) stage-specificity of the learned co-association patterns, measured as the fraction of binding modules that are embryonic. (b) Histogram of preceding (T1) versus subsequent (T2) stage-specificities.
Figure 4
Figure 4
Cell-type and lineage resolution of regulator activity and TF co-associations. (a) Tissue enrichment (−log10, P-value) and depletion (log10, P-value) scores for the expressing population of each gene are shown (Fisher’s exact, Bonferroni-corrected). Only genes with significant enrichments (or depletions) are shown. (b) Co-association strength (Fig. 2a) versus cellular overlap coefficient for 13 focus factors. The Jaccard index for the cellular overlap is indicated for each gene pair by ring size and color. (c) Cellular-resolution regulatory binding SOM. Cellular-resolution binding modules were generated by annotating in each cell, the binding of focus factors expressed in the cell. Cellular-resolution binding modules (inset) were clustered into a SOM with 268 learned co-association patterns, 161 (68%) of which were discovered in the data. The SOM is colored by the number of factors in the learned co-association patterns. (d) Tissue classes and co-association signatures are shown for 43 co-association patterns with significant enrichments. Tissue enrichments of interest are highlighted red.

Comment in

Similar articles

See all similar articles

Cited by 38 PubMed Central articles

See all "Cited by" articles

References

    1. Davidson EH. Emerging properties of animal gene regulatory networks. Nature. 2010;468:911–920. - PMC - PubMed
    1. Spitz F, Furlong EEM. Transcription factors: from enhancer binding to developmental control. Nat Rev Genet. 2012;13:613–626. - PubMed
    1. Lee TI, Young RA. Transcriptional regulation and its misregulation in disease. Cell. 2013;152:1237–1251. - PMC - PubMed
    1. Bao Z, et al. Automated cell lineage tracing in Caenorhabditis elegans. Proc Natl Acad Sci USA. 2006;103:2707–2712. - PMC - PubMed
    1. Murray JI, et al. Multidimensional regulation of gene expression in the C. elegans embryo. Genome Res. 2012;22:1282–1294. - PMC - PubMed

Publication types

MeSH terms

Substances

Feedback