Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
Filters applied. Clear filters
. 2005 Jan;3(1):e7.
doi: 10.1371/journal.pbio.0030007. Epub 2004 Nov 11.

Highly conserved non-coding sequences are associated with vertebrate development

Affiliations

Highly conserved non-coding sequences are associated with vertebrate development

Adam Woolfe et al. PLoS Biol. 2005 Jan.

Abstract

In addition to protein coding sequence, the human genome contains a significant amount of regulatory DNA, the identification of which is proving somewhat recalcitrant to both in silico and functional methods. An approach that has been used with some success is comparative sequence analysis, whereby equivalent genomic regions from different organisms are compared in order to identify both similarities and differences. In general, similarities in sequence between highly divergent organisms imply functional constraint. We have used a whole-genome comparison between humans and the pufferfish, Fugu rubripes, to identify nearly 1,400 highly conserved non-coding sequences. Given the evolutionary divergence between these species, it is likely that these sequences are found in, and furthermore are essential to, all vertebrates. Most, and possibly all, of these sequences are located in and around genes that act as developmental regulators. Some of these sequences are over 90% identical across more than 500 bases, being more highly conserved than coding sequence between these two species. Despite this, we cannot find any similar sequences in invertebrate genomes. In order to begin to functionally test this set of sequences, we have used a rapid in vivo assay system using zebrafish embryos that allows tissue-specific enhancer activity to be identified. Functional data is presented for highly conserved non-coding sequences associated with four unrelated developmental regulators (SOX21, PAX6, HLXB9, and SHH), in order to demonstrate the suitability of this screen to a wide range of genes and expression patterns. Of 25 sequence elements tested around these four genes, 23 show significant enhancer activity in one or more tissues. We have identified a set of non-coding sequences that are highly conserved throughout vertebrates. They are found in clusters across the human genome, principally around genes that are implicated in the regulation of development, including many transcription factors. These highly conserved non-coding sequences are likely to form part of the genomic circuitry that uniquely defines vertebrate development.

PubMed Disclaimer

Figures

Figure 1
Figure 1. Distribution of CNEs along the Human Genome
(A) Each CNE is plotted relative to its position along each of human Chromosomes 1 to 9 (data for other chromosomes not shown). The y-axis represents length along the chromosome (in megabases). (B) Distribution of the fraction of CNEs that are within certain distances of each other; e.g., 85% of the distances between CNEs are less than or equal to 370 kb. χ2 tests were carried out by comparing observed cluster sizes with those generated randomly for each chromosome (see Materials and Methods).
Figure 2
Figure 2. CNE Clusters Are Found Close to Trans-Dev Genes in the Human Genome
Chromosomal locations of trans-dev genes that are within 500 kb of CNE clusters in the human genome (each cluster is represented by a green arrowhead). Genes in bold script are located next to clusters of ten or more CNEs. Gene names are taken from Ensembl v23.34e.1. Graph inset shows distribution of CNE cluster sizes in the human genome.
Figure 3
Figure 3. Comparative Sequence Analysis of the SOX21 Gene
SOX21 genomic regions for mouse, human, and rat were extracted from Ensembl to include all flanking DNA up to the nearest neighbouring genes (ABCC4 and NM_180989 in the human genome and their orthologues in the rodent genomes). The region covering Fugu SOX21 (138–178 kb of Fugu Scaffold_293 [M000293]) was extracted from the Fugu Genome Server at http://fugu.rfcgr.mrc.ac.uk/fugu-bin/clonesearch. (A) MLAGAN alignment of the SOX21 gene using Fugu DNA as the base sequence compared with mouse, rat, and human genomic DNA. Coloured peaks represent regions of sequence conservation above 60% over at least 40 bp. The SOX21 coding region (SOX21 is a single exon gene) is annotated, and sequence identity is shaded in blue. Non-coding regions of sequence identity are shaded in pink. The eight elements that have been functionally assayed are labelled. Six of these are identified in the global analysis as seven CNEs (SOX21_8–10 covers two CNEs). SOX21_7 and SOX21_18 are rCNEs. (B) Multiple DNA sequence alignments of CNE SOX21_1 and CNE SOX21_19 between mouse, rat, human, and Fugu.
Figure 4
Figure 4. MLAGAN Alignments of Regions Encompassing the PAX6, HLXB9, and SHH Genes
PAX6 (A), HLXB9 (B), and SHH (C). In each panel, human (top), mouse (middle), and rat (bottom) genomic DNA from Ensembl is aligned with Fugu genomic DNA from orthologous regions. Alignment parameters are the same as in Figure 2. Seventeen elements that have been functionally assayed from these regions have been labelled. The following were identified as CNEs: PAX6_6, PAX6_9–10, KIAA0010_1, and KIAA0010_3.
Figure 5
Figure 5. Composite Overviews of GFP Expression Patterns Induced by Different Elements Tested in the Functional Assay
Cumulative GFP expression data, from SOX21-associated elements (A), PAX6-associated elements (B), HLXB9-associated elements (C), and SHH-associated elements (D). Cumulative data pooled from multiple embryos per element on day 2 of development (approximately 26–33 hpf) are displayed schematically overlayed on camera lucida drawings of a 31-hpf zebrafish embryo. Categories of cell type are colour-coded: key is at bottom of figure. Bar graphs encompass the same dataset as the schematics and use the same colour code for tissue types. Bar graphs display the percentage of GFP-expressing embryos that show expression in each tissue category for a given element. The total number of expressing embryos analysed per element is displayed in the top left corner of each graph. Legend for the bar graph columns accompanies the bottom graph in each panel; “blood+” refers to circulating blood cells plus blood island region, “heart+” refers to heart and pericardial region (Please note: Some cells categorised as heart/pericardial region may be circulating blood cells), and “skin” refers to cells of the epidermis or EVL. s. cord, spinal cord.
Figure 6
Figure 6. Different Elements Enhance GFP Expression in Specific Tissue and Cell Types
GFP expression is shown in fixed tissue following wholemount anti-GFP immunostaining, bright-field views (A–D, F, J, K, and N), or in live embryos as GFP fluorescence, merged bright-field and fluorescent views (E, G–I, L, M, and O). Lateral views, anterior to the left, dorsal to the top (A, B, and D–O) or dorsal view, anterior to the top (C). Embryos approximately 28–33 hpf (A, D–I, L, and O), approximately 48 hpf (B, C, J, K, and N), or approximately 26 hpf (M). The identity of the element co-injected with the GFP reporter construct is shown at the bottom of each panel. Black arrows indicate the approximate position of the midbrain–hindbrain boundary; black and white arrowheads indicate GFP-expressing cells. Scale bars approximately 100 μm (A–E, G–I, and L–O) and 50 μm (F, J, and K). b, blood island; d, diencephalon; e, eye; f, fin fold; hb, hindbrain; l, lens; n, notochord; ov, otic vesicle; r, retina; s, somite; sc, spinal cord; t, telencephalon; te, tectum; y, yolk. (A) SOX21_4. Head region (eyes removed): neurons in the telencephalon and diencephalon are GFP-positive (arrowheads). (B) SOX21_19. Head region: numerous GFP-expressing neurons are visible in the forebrain, midbrain, and hindbrain. Retinal expression is also apparent. (C) SOX21_5–6. Hindbrain region: white arrowheads indicate GFP expression by several cells in the epithelium of the right developing ear (ov). GFP-expressing cells in left deveoping ear are in slightly different focal plane. (D) SOX21_1. Trunk region: two individual notochord cells express GFP (arrowheads). (E) PAX6_6. Head region of live embryo: GFP is expressed in several retinal cells. (F) PAX6_9–10. Anterior trunk region (at the level of somites 1–3): three spinal cord neurons with ventrally projecting axons express GFP (arrowheads). (G) PAX6_1. Tail region of live embryo: arrowhead indicates GFP expression in the developing median fin fold. (H) KIAA0010_1. Trunk region, three notochord cells express GFP (arrowheads). (I) KIAA0010_2. Anterior end of embryo: arrowheads point to circulating blood cells expressing GFP. (J) HLXB9_3. Trunk region: GFP-expressing muscle fibres in somite 5 (arrowheads) lie immediately dorsal and ventral to the horizontal myoseptum. (K) HLXB9_3. Trunk region (at the level of somites 13–15): arrowheads mark GFP expression in six cells forming the epidermis or EVL. (L) SHH_6. Whole live embryo: numerous GFP-expressing muscle fibres can be seen in the trunk. (M) SHH_1. Tail region of live embryo: GFP is expressed in a single bipolar neuron near the caudal end of the spinal cord (arrowhead marks cell body). (N) SHH_4. Head region (dorsolateral view): cells labelled with anti-GFP include midbrain and hindbrain neurons and cells in the retina (slightly out of focal plane). Arrowheads indicate cell bodies of hindbrain neurons, from which axons can be seen projecting ventrally. (O) SHH_2. Trunk region of live embryo: GFP-positive cells in the region of the blood islands (caudal to the urogenital opening; arrowheads) show a slightly elongated morphology, suggesting they may be blood vessel precursors rather than blood cells.

Similar articles

Cited by

References

    1. Davidson EH, Rast JP, Oliveri P, Ransick A, Calestani C, et al. A genomic regulatory network for development. Science. 2002;295:1669–1678. - PubMed
    1. Davidson EH, McClay DR, Hood L. Regulatory gene networks and the properties of the developmental process. Proc Natl Acad Sci U S A. 2003;100:1475–1480. - PMC - PubMed
    1. Albert R, Othmer H. The topology of the regulatory interactions predicts the expression pattern of the segment polarity genes in Drosophila melanogaster . J Theor Biol. 2003;223:1–18. - PMC - PubMed
    1. Oliveri P, Davidson EH. Gene regulatory network controlling embryonic specification in the sea urchin. Curr Opin Genet Dev. 2004;14:351–360. - PubMed
    1. Lander ES, Linton LM, Birren B, Nusbaum C, Zody MC, et al. Initial sequencing and analysis of the human genome. Nature. 2001;409:860–921. - PubMed

Publication types

MeSH terms

Associated data