Skip to main page content
Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
, 208 (3), 937-949

The ModERN Resource: Genome-Wide Binding Profiles for Hundreds of Drosophila and Caenorhabditis elegans Transcription Factors

Affiliations

The ModERN Resource: Genome-Wide Binding Profiles for Hundreds of Drosophila and Caenorhabditis elegans Transcription Factors

Michelle M Kudron et al. Genetics.

Abstract

To develop a catalog of regulatory sites in two major model organisms, Drosophila melanogaster and Caenorhabditis elegans, the modERN (model organism Encyclopedia of Regulatory Networks) consortium has systematically assayed the binding sites of transcription factors (TFs). Combined with data produced by our predecessor, modENCODE (Model Organism ENCyclopedia Of DNA Elements), we now have data for 262 TFs identifying 1.23 M sites in the fly genome and 217 TFs identifying 0.67 M sites in the worm genome. Because sites from different TFs are often overlapping and tightly clustered, they fall into 91,011 and 59,150 regions in the fly and worm, respectively, and these binding sites span as little as 8.7 and 5.8 Mb in the two organisms. Clusters with large numbers of sites (so-called high occupancy target, or HOT regions) predominantly associate with broadly expressed genes, whereas clusters containing sites from just a few factors are associated with genes expressed in tissue-specific patterns. All of the strains expressing GFP-tagged TFs are available at the stock centers, and the chromatin immunoprecipitation sequencing data are available through the ENCODE Data Coordinating Center and also through a simple interface (http://epic.gs.washington.edu/modERN/) that facilitates rapid accessibility of processed data sets. These data will facilitate a vast number of scientific inquiries into the function of individual TFs in key developmental, metabolic, and defense and homeostatic regulatory pathways, as well as provide a broader perspective on how individual TFs work together in local networks and globally across the life spans of these two key model organisms.

Keywords: Caenorhabditis elegans; Drosophila; binding sites; regulation; transcription factors.

Figures

Figure 1
Figure 1
Schematic of the modERN ChIP-seq pipeline. Example TF-tagged constructs for worm and fly are shown. Transgenic worms were generated by bombardment of fosmid constructs containing a single TF with dual GFP and 3xFLAG tags into unc-119 mutants. For fly, recombineered BACs containing a GFP-tagged TF were injected into embryos expressing ϕ31 integrase to target genomic integration of the entire BAC into well-characterized engineered docking sites. Integration of the BAC was confirmed by PCR. Worms and flies expressing the GFP-tagged TF were grown, fixed, homogenized, and/or sheared to obtain chromatin for immunoprecipitation. The same GFP antibody was used for ChIP in both organisms. All libraries and sequencing were conducted at the same site. Access to all of the modENCODE and modERN ChIP-seq data can be found at either the EPIC modERN website (http://epic.gs.washington.edu/modERN/) or the ENCODE DCC site (http://encodeproject.org). ChIP-seq, chromatin immunoprecipitation sequencing; DCC, Data Coordinating Center; EPIC, European Photonics Industry Consortium; TF, transcription factor; modENCODE, Model Organism ENCyclopedia Of DNA Elements; modERN; model organism Encyclopedia of Regulatory Networks.
Figure 2
Figure 2
Cell type specificity of expression reflects the number of binding sites in promoters. The dispersion score (a measure of how broadly or specifically expressed a gene is, with increasing score representing increasing specificity) of each of 5401 expressed genes is plotted against the number of binding sites in the largest cluster of sites upstream of the gene. Genes with high dispersion scores overwhelmingly have < 30 binding sites in the largest upstream cluster, whereas genes with low dispersion scores (< 3) can have very large clusters of sites upstream. Dispersion scores for 14,535 protein-coding genes were obtained from the L2 single-cell combinatorial indexing RNA sequencing data set (Cao et al. 2017) using the estimateDispersions function in Monocle2. Dispersion scores > 10 show expression predominantly in a single cell type. Of these, 7503 had the upstream gene in the same orientation; all binding sites in the intergenic space plus 200 bases downstream of the transcript start site were accordingly assigned to the downstream gene. Of these, 5401 had at least one binding site. The cluster with the maximum number of sites was used for plotting.
Figure 3
Figure 3
Global pairwise transcription factor coassociation matrix (NT = 155,630) as defined by promoter interval statistics (Chikina and Troyanskaya 2012), followed by clustering of factors based on those scores. Coassociation scores are scaled by the SD (uncentered) for visualization purposes. Clusters with mutually high-scoring coassociations are apparent both along and off the diagonal. Several clusters that contain transcription factors of known specificity are outlined and enlarged to show the various factors and stages involved.
Figure 4
Figure 4
Screenshot of the EPIC modERN database. All worm and fly data from both the modERN and modENCODE consortiums can be accessed at (http://epic.gs.washington.edu/modERN/). See Figure S2 for a tutorial on how to navigate the site. BDGP, Berkeley Drosophila Genome Project; ChIP, chromatin immunoprecipitation; EPIC, European Photonics Industry Consortium; TF, transcription factor; modENCODE, Model Organism ENCyclopedia Of DNA Elements; modERN; model organism Encyclopedia of Regulatory Networks.

Similar articles

See all similar articles

Cited by 13 PubMed Central articles

See all "Cited by" articles

Publication types

Substances

Feedback