Skip to main page content
Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
Comparative Study
. 2005 Jan;15(1):1-18.
doi: 10.1101/gr.3059305.

Comparative Genome Sequencing of Drosophila Pseudoobscura: Chromosomal, Gene, and Cis-Element Evolution

Affiliations
Free PMC article
Comparative Study

Comparative Genome Sequencing of Drosophila Pseudoobscura: Chromosomal, Gene, and Cis-Element Evolution

Stephen Richards et al. Genome Res. .
Free PMC article

Abstract

We have sequenced the genome of a second Drosophila species, Drosophila pseudoobscura, and compared this to the genome sequence of Drosophila melanogaster, a primary model organism. Throughout evolution the vast majority of Drosophila genes have remained on the same chromosome arm, but within each arm gene order has been extensively reshuffled, leading to a minimum of 921 syntenic blocks shared between the species. A repetitive sequence is found in the D. pseudoobscura genome at many junctions between adjacent syntenic blocks. Analysis of this novel repetitive element family suggests that recombination between offset elements may have given rise to many paracentric inversions, thereby contributing to the shuffling of gene order in the D. pseudoobscura lineage. Based on sequence similarity and synteny, 10,516 putative orthologs have been identified as a core gene set conserved over 25-55 million years (Myr) since the pseudoobscura/melanogaster divergence. Genes expressed in the testes had higher amino acid sequence divergence than the genome-wide average, consistent with the rapid evolution of sex-specific proteins. Cis-regulatory sequences are more conserved than random and nearby sequences between the species--but the difference is slight, suggesting that the evolution of cis-regulatory elements is flexible. Overall, a pattern of repeat-mediated chromosomal rearrangement, and high coadaptation of both male genes and cis-regulatory sequences emerges as important themes of genome divergence between these species of Drosophila.

Figures

Figure 1.
Figure 1.
The syntenic relationship between D. pseudoobscura and D. melanogaster. Synteny dot-plots showing the shuffled syntenic relationships between D. pseudoobscura and D. melanogaster for the five chromosome arms. In each case the D. melanogaster chromosome is shown on the x-axis and the D. pseudoobscura chromosome on the y-axis. Note that lines within the graph are all of the same thickness, but are of varying length. Owing to the compression inherent in the figure, many of the lines are shorter than they are wide. Chromosomes have been color coded to allow identification of interchromosomal synteny blocks. For example, in the top left of the D. pseudoobscura Chromosome 4–D. melanogaster Chromosome 2L plot, a small region of sequence on D. pseudoobscura Chromosome 4 with similarity to D. melanogaster Chromosome X can be seen. Muller element F is not shown because of the lack of sequence anchoring data on this chromosome.
Figure 2.
Figure 2.
Mapping intraspecific inversion breakpoints. (A) Comparison of Muller element C between D. melanogaster and the Arrowhead arrangement of D. pseudoobscura revealed a junction in conserved linkage near vestigial (vg). The numbered sections 51E2, 58E1, 49D2, and 58D8 are the D. melanogaster cytological locations that are homologous to 70A, 76B, 70B, and 76C sections on the D. pseudoobscura cytological map, respectively. vg maps near the distal breakpoint of the inversion that converted the Standard arrangement into the Arrowhead arrangement (Schaeffer et al. 2003). The locations of four PCR primers, a, b, c, and d, are shown on the Standard and Arrowhead physical maps. Note that the two internal primers, b and c, are switched in the two chromosomes. (B) PCR results. The Arrowhead-specific primer combinations (a + c and b + d) only amplified Arrowhead DNA, while the Standard-specific primer combinations (a + b and c + d) only amplified breakpoints on Standard arrangements. Sequence analysis of the PCR products from the Standard and Arrowhead backgrounds verifies that PCR amplified the appropriate sequences.
Figure 3.
Figure 3.
Structure of the repeats within the breakpoints that converted the Standard gene arrangement into the Arrowhead arrangement. The heavy line at the bottom indicates Muller element C, and the tick marks indicate the locations of the proximal and distal breakpoints for the Arrowhead inversion. The black histograms at the top indicate the frequency that a BLAST High-scoring Segment Pair (HSP) included a particular nucleotide in BLASTN comparison of each breakpoint to the entire genome (E-value ≤1 × 10-5). Two repeat families of 128 and 315 bp (open and filled boxes, respectively) are shown within the two breakpoint regions within the detail regions at the top of the figure. The individual repeats were labeled with a three-letter designation, where the first letter indicates proximal or distal, the number indicates the repeat family, and the last letter indicates the distinct repeat copy. Larger repeats can be generated from the small repeats such as the 443-bp repeat created by the adjacent 128- and 315-bp repeats. The dashed box indicates the putative repeat unit involved in the rearrangement event, and the triangles indicate the approximate location of the DNA breaks with respect to the repeat motif.
Figure 4.
Figure 4.
Rearrangement of conserved linkage groups between D. melanogaster and D. pseudoobscura. The thick horizontal lines represent the chromosomal maps of the D. melanogaster and D. pseudoobscura Muller element C. Vertical lines drawn either down (D. melanogaster) or up (D. pseudoobscura) indicate conserved linkage groups. The locations and orientations of 80 breakpoint motifs are indicated with open and filled triangles at the junctions of conserved linkage groups. Diagonal lines connect homologous linkage groups in the two species where a single inversion event between breakpoint motifs will bring adjacent D. melanogaster genes together (dashed and gray lines). A second example that shows ectopic exchange between a pair of motifs where only one breakpoint brings adjacent D. melanogaster genes together is indicated with black solid lines.
Figure 5.
Figure 5.
Averaged conservation of different segments of a “prototypical gene.” Conservation statistics were computed over thousands of aligned pairs of regions of various types, aligned at different reference points. At each position we compute the fraction of aligned pairs that have identical bases at that position (green + purple tiers), have mismatched bases (red), melanogaster bases aligned to deleted bases in pseudoobscura (yellow), or are unaligned in our synteny-filtered BLASTZ alignment (blue). The purple tier shows the fraction of bases that would be expected to match by chance given the base composition at that position in both species. The expected match is <25% because of the inclusion of unaligned and deleted sequences; if these are removed, the baseline is ∼28% because of the slight AT richness of the genome. The vertical panels correspond to different segments of a prototypical gene, indicated on the x-axis. A cartoon of the prototypical gene is represented under the panels. The segments are labeled by the segment of the gene followed in parentheses by the part of that segment by which the segment was aligned. For example, CDS (5′-end) represents the start of the coding sequence aligned by the ATG start sequence, whereas the coding exon (3′-end) is aligned at the 3′-end of the coding exon, and thus the sequences are not all in phase with each other. (A) RIC, random intergenic controls for CRE analysis; (B) nearby controls in order from -250 bp to +250 bp offset from CREs. The right-most nearby controls are closest to the gene start and therefore in a region that is on average more conserved. Some of the nearby controls have a higher match percent (green) as a result; however, CREs have the highest match percent of identical base pairs as a fraction of aligned bases (everything but blue). (C) 142 Cis-regulatory elements of 50 bp or less from literature; (D) compressed sampling of the 5′-proximal region every 50 bp from 50 to 500; (E) 50 bp proximal to the transcription start site (TS), aligned at TS; (F) genomic span of 5′-UTR, aligned at TS; (G) 5′-UTR span aligned at protein start site (PS); (H) 5′-end of protein-coding region aligned at PS; (I) 3′-end of coding exons aligned at donor site; (J) intron aligned at donor site; (K) introns aligned at acceptor; (L) 5′-end of internal coding exons aligned at acceptor site; (M) 3′-end of protein-coding region aligned at protein end site (PE); (N) 3′-UTR span aligned at PE; (O) 3′-UTR span aligned at transcript end; (P) 50 bp of 3′-proximal region aligned at transcript end; (Q) compressed sampling of 3′-proximal region every 50 bp from 50 to 500; and (R) genome-wide average.
Figure 6.
Figure 6.
Distributions of dN, dS, and radical and conservative amino acid changes. (A) Distribution of dS and (B) distribution of dN (numbers of synonymous substitutions per synonymous site and of nonsynonymous substitutions per nonsynonymous site). (C) Distribution of the ratio ω = dN/dS for the melanogasterpseudoobscura comparison of 9184 inferred orthologous protein-coding genes. (DF) Distributions of α, the ratio of rates of substitution that are radical to those that are conservative, based on 9184 alignments of orthologous protein-coding genes in D. pseudoobscura and D. melanogaster. Radical changes influence charge (D), polarity (E), or polarity and volume (F) to a greater degree than do conservative changes. A substitution model was fitted by maximum likelihood to estimate these rate parameters.
Figure 7.
Figure 7.
Smoothed distributions of percent identity values for the three groups of cis-regulatory element sequences, excluding sequences with no aligned bases. The KS test can be viewed as answering the question “are these curves different?” All three curves are significantly different (see Table 2). The true CREs show a distinctive peak in the 80%–90% identity range, presumably a consequence of stabilizing selection. The rise on the left is due to unaligned or mostly deleted sequences.
Figure 8.
Figure 8.
Mechanism for chromosomal inversion with a repeated sequence motif. A hypothetical chromosome is shown with genes A through N and two repeated sequence motifs (open and black arrows) in a reverse orientation (top). Repeated motifs are shown pairing during meiosis with a recombination event occurring in the middle of the paired motifs (middle). Resolution of the recombination event between the repeated sequence motifs leading to the inversion of the central gene region (bottom).

Similar articles

See all similar articles

Cited by 272 articles

See all "Cited by" articles

Publication types

Associated data

LinkOut - more resources

Feedback