Skip to main page content
Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
. 2010 Dec 24;330(6012):1775-87.
doi: 10.1126/science.1196914. Epub 2010 Dec 22.

Integrative Analysis of the Caenorhabditis Elegans Genome by the modENCODE Project

Mark B Gerstein  1 Zhi John LuEric L Van NostrandChao ChengBradley I ArshinoffTao LiuKevin Y YipRebecca RobilottoAndreas RechtsteinerKohta IkegamiPedro AlvesAurelien ChateignerMarc PerryMitzi MorrisRaymond K AuerbachXin FengJing LengAnne VielleWei NiuKahn RhrissorrakraiAshish AgarwalRoger P AlexanderGalt BarberCathleen M BrdlikJennifer BrennanJeremy Jean BrouilletAdrian CarrMing-Sin CheungHiram ClawsonSergio ContrinoLuke O DannenbergAbby F DernburgArshad DesaiLindsay DickAndréa C DoséJiang DuThea EgelhoferSevinc ErcanGhia EuskirchenBrent EwingElise A FeingoldReto GassmannPeter J GoodPhil GreenFrancois GullierMichelle GutweinMark S GuyerLukas HabeggerTing HanJorja G HenikoffStefan R HenzAngie HinrichsHeather HolsterTony HymanA Leo IniguezJudith JanetteMorten JensenMasaomi KatoW James KentEllen KephartVishal KhivansaraEkta KhuranaJohn K KimPaulina Kolasinska-ZwierzEric C LaiIsabel LatorreAmber LeaheySuzanna LewisPaul LloydLucas LochovskyRebecca F LowdonYaniv LublingRachel LyneMichael MacCossSebastian D MackowiakMarco MangoneSheldon McKayDesirea MecenasGennifer MerrihewDavid M Miller 3rdAndrew MuroyamaJohn I MurraySiew-Loon OoiHoang PhamTaryn PhippenElicia A PrestonNikolaus RajewskyGunnar RätschHeidi RosenbaumJoel RozowskyKim RutherfordPeter RuzanovMihail SarovRajkumar SasidharanAndrea SbonerPaul ScheidEran SegalHyunjin ShinChong ShouFrank J SlackCindie SlightamRichard SmithWilliam C SpencerE O StinsonScott TaingTeruaki TakasakiDionne VafeadosKsenia VoroninaGuilin WangNicole L WashingtonChristina M WhittleBeijing WuKoon-Kiu YanGeorg ZellerZheng ZhaMei ZhongXingliang ZhoumodENCODE ConsortiumJulie AhringerSusan StromeKristin C GunsalusGos MicklemX Shirley LiuValerie ReinkeStuart K KimLaDeana W HillierSteven HenikoffFabio PianoMichael SnyderLincoln SteinJason D LiebRobert H Waterston
Affiliations
Free PMC article

Integrative Analysis of the Caenorhabditis Elegans Genome by the modENCODE Project

Mark B Gerstein et al. Science. .
Free PMC article

Erratum in

  • Science. 2011 Jan 7;331(6013):30

Abstract

We systematically generated large-scale data sets to improve genome annotation for the nematode Caenorhabditis elegans, a key model organism. These data sets include transcriptome profiling across a developmental time course, genome-wide identification of transcription factor-binding sites, and maps of chromatin organization. From this, we created more complete and accurate gene models, including alternative splice forms and candidate noncoding RNAs. We constructed hierarchical networks of transcription factor-binding and microRNA interactions and discovered chromosomal locations bound by an unusually large number of transcription factors. Different patterns of chromatin composition and histone modification were revealed between chromosome arms and centers, with similarly prominent differences between autosomes and the X chromosome. Integrating data types, we built statistical models relating chromatin, transcription factor binding, and gene expression. Overall, our analyses ascribed putative functions to most of the conserved genome.

Figures

Fig. 1
Fig. 1
Transcriptome features and alternative splicing. (A) Bar graphs indicate the number of confirmed splice junctions categorized by type. The leftmost bars show the progression from project start (6) to the aggregate integrated transcript set. The three other groups provide data for the various developmental stages, males, mutants, and populations exposed to pathogens. Specific sample names are described in table S3. (B) Histogram of fractional differences in isoform composition for 12,875 genes with multiple isoforms in 21 pair-wise comparisons across seven developmental stages. A fractional difference close to 1 indicates large differences in the relative composition. (C) Representative example (F01G12.5; let-2), illustrating alternative exon usage across stages. (D) Example of a differentially transcribed pseudogene creating a ncRNA. Rows are normalized signal tracks for the various developmental stages, showing the expression pattern of the parent gene (T01B11.7.1; orange) and an associated duplicated pseudogene (PP00501, green).
Fig. 2
Fig. 2
Expression and binding dynamics. (A) Spearman correlations of gene expression and RNA Pol II binding across seven stages. Expression-level correlations are shown above the diagonal; RNA Pol II–binding correlations appear below. For both expression and binding, there is a notable transition between embryonic and larval stages. (B) Correlation of RNA Pol II–binding levels with gene expression. Although RNA Pol II–binding in embryonic stages shows low correlation with gene expression in larval and young adult stages, expression in the embryo correlates moderately well with RNA Pol II–binding later. (C) Principal components analysis (PCA) of six matched tissue samples from mixed embryo (MxE) and L2 (7). GABA, γ-aminobutyric acid.
Fig. 3
Fig. 3
Integrated miRNA-TF regulatory network. (A) TFs are organized hierarchically, and those miRNAs either regulating or being regulated by the TFs are shown. (TF names are in fig S36.) All larval TF-TF interactions in HOT regions were removed. Tissue specificity and number of protein-protein interactions are shown for each of the hierarchical levels (6). (B) TF network after filtering out edges that do not show a significant correlation in their expression patterns. Also shown is a schematic representation of the target genes of the 18 larval TFs. (C) One of the three significantly enriched network motifs (other two are in fig. S37). (D) Enrichment of binding targets and signal of TFs in noncoding versus coding genes. Max signal equals the ratio of maximum binding signal of a TF at noncoding versus coding genes. Target fraction represents the ratio of target percentage in noncoding genes to that in coding genes (fig. S22A).
Fig. 4
Fig. 4
HOT regions. (A) TF-binding peaks at a HOT region and two “factor-specific regions” on chromosome III: 7,206,000 to 7,220,000. Top tracks show read density (scaled based on the total mapped reads) from 22 ChIP-seq experiments. Bottom tracks show ChIP-seq controls, RNA-seq expression levels, and ChIP-chip signals for two histone modifications. (B) 304 HOT regions bound by 15 or more factors and 50 randomly chosen TF-bound regions. Each row represents a TF, and each region is colored by enrichment q value (6). (C) Genes associated with HOT regions are broadly expressed. Single-cell gene expression measurements of 93 mCherry reporters (30) are shown separated by whether the promoter contains a HOT region, contains a region bound by 10 to 14 factors, or contains only regions bound by 0 to 9 factors (gene names are in fig. S29). The x axis represents 363 specific cells present in L1-stage animals.
Fig. 5
Fig. 5
Chromosome-scale domains of chromatin organization. (A and B) Whole-genome ChIP-chip data for various histone modifications and chromatin-associated proteins, along with relevant genome annotations, were normalized, placed into 10-kb bins, and displayed as a heat map. Red indicates a stronger signal, and blue indicates a weaker signal. The continuous black line plots the relationship between physical (x axis) and genetic (y axis) distance. Three major groups were identified by hierarchical clustering. Group 1 contains H3K9 methylation marks and LEM-2, which tend to be enriched at distal autosomal regions, and correlate with repetitive DNA and a high recombination rate. Group 2 contains dosage compensation complex members and H4K20me1, which are highly enriched on X. Group 3 contains marks associated with active chromatin. Generally, signals for active marks are weaker on the X chromosome than the autosomes. This megabase-scale chromatin organization persists through all stages examined. (A) Chromosome III is representative of autosomes. (B) X has a distinct chromatin configuration. (C) H3K9me1, - 2, and -3 signals decrease gradually at the boundaries between the central and distal domains, whereas the boundaries defined by LEM-2 are relatively sharp. (D) A schematic representation of key findings.
Fig. 6
Fig. 6
Chromatin patterns around genes. Average gene profiles around the TSS and TTS of various histone marks displayed for the (red) X chromosome and (blue) autosomes. Genes were further stratified according to their expression level, with the top 20% of expressed genes shown in darker shade and the bottom 20% of expressed genes shown in lighter color. Marks typically associated with active or repressed transcription are labeled on the left.
Fig. 7
Fig. 7
Statistical models predicting TF-binding and gene expression from chromatin features. (A) Modeling TF-binding sites with chromatin features. The color of each cell represents the accuracy of a statistical model in which a chromatin feature or a set of features acts as predictor for TF binding or HOT regions. (B) An example of combining chromatin and sequence features. Potential binding sites of HLH-1 were predicted by using only sequence motifs, only chromatin features, or both. (C) Correlation pattern for a number of chromatin features in 100-bp bins around the TSS (± 4 kb) and TTS (± 4 kb) of transcripts at the early embryo (EE) stage. The Spearman correlation coefficient of each chromatin feature with gene-expression levels was calculated for each bin. (D) Chromatin features can predict expression levels for both protein-coding genes and miRNAs. (Top) A model involving all chromatin features. (Bottom) The model for protein-coding genes can also be used to predict accurately miRNA expression levels.
Fig. 8
Fig. 8
Relative proportion of annotations among constrained sequences. (A) Relative proportion of constrained and unconstrained bases in the C. elegans genome. Within the constrained region, the stacked bar chart shows the cumulative proportion covered by various classes of annotated genomic elements. (B) Fraction of element classes covering (red) constrained and (gray) unconstrained bases. The error bars show the 95% confidence interval for random placement of elements calculated with GSC. If the ends of the columns are outside the confidence interval, then it is unlikely that the fraction of the element class overlapping constrained and/or unconstrained bases could have occurred by chance. (C) Constraint profiles of broad categories of elements. The x axis indicates the PhastCons score of bases covered by the element ranging from 0 (no conservation) to 1.0 (perfect conservation). The y axis indicates the log ratio of the number of bases with the given score covered, relative to what would be expected by random element placement (dotted line) (fig. S45 shows more detail).

Comment in

Similar articles

See all similar articles

Cited by 497 articles

See all "Cited by" articles

Publication types

MeSH terms

Feedback