Skip to main page content
Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
. 2003 Oct;73(4):823-34.
doi: 10.1086/378594. Epub 2003 Sep 22.

An Alu Transposition Model for the Origin and Expansion of Human Segmental Duplications

Affiliations
Free PMC article

An Alu Transposition Model for the Origin and Expansion of Human Segmental Duplications

Jeffrey A Bailey et al. Am J Hum Genet. .
Free PMC article

Abstract

Relative to genomes of other sequenced organisms, the human genome appears particularly enriched for large, highly homologous segmental duplications (> or =90% sequence identity and > or =10 kbp in length). The molecular basis for this enrichment is unknown. We sought to gain insight into the mechanism of origin, by systematically examining sequence features at the junctions of duplications. We analyzed 9,464 junctions within regions of high-quality finished sequence from a genomewide set of 2,366 duplication alignments. We observed a highly significant (P<.0001) enrichment of Alu short interspersed element (SINE) sequences near or within the junction. Twenty-seven percent of all segmental duplications terminated within an Alu repeat. The Alu junction enrichment was most pronounced for interspersed segmental duplications separated by > or =1 Mb of intervening sequence. Alu elements at the junctions showed higher levels of divergence, consistent with Alu-Alu-mediated recombination events. When we classified Alu elements into major subfamilies, younger elements (AluY and AluS) accounted for the enrichment, whereas the oldest primate family (AluJ) showed no enrichment. We propose that the primate-specific burst of Alu retroposition activity (which occurred 35-40 million years ago) sensitized the ancestral human genome for Alu-Alu-mediated recombination events, which, in turn, initiated the expansion of gene-rich segmental duplications and their subsequent role in nonallelic homologous recombination.

Figures

Figure  1
Figure 1
Flowchart for the characterization of segmental duplication junctions. An overview of the strategy to identify human segmental duplications and the characterization of their junctions is presented. In brief, seed alignments were established on the basis of a whole-genome alignment comparison of the human sequence assembly (build 30). Junctions were identified by heuristically extending these alignments until the optimal end point of the alignment was identified. A total of 2,366 optimal global alignments with <99.8% and >90.0% sequence identity and that were >5 kb in length were retained in this analysis. All junctions were hand curated and visually inspected using the program Miropeats. See the “Material and Methods” section and the appendix (online only) for a more detailed description and for the precise in silico parameters used in this analysis.
Figure  2
Figure 2
Junction analysis. A, Diagram representing a typical sequence alignment and the junction and control regions considered in the analysis. For each alignment, the sequence content of the four junction intervals (red) (10-bp windows centered at the alignment end points) was compared with the control sequence (blue) (duplicated sequence ±1 kb flanking sequence). The overall fraction of bases for any given sequence feature (repeat, GC content, etc.) was calculated over all 2,366 alignments. B, Histogram comparing the repeat content of the junction with the control region as well as the average finished genome. Repeat content is measured as a total fraction of analyzed bases. Significant differences (P<.0001) were observed for Alu and satellite repeats in terms of both junction versus control (*) and control versus finished genome (**). A more refined analysis of the specific subfamilies is available in table A (online only). C, We performed simulation studies to determine the significance of the observed enrichments compared with the control sequence by randomly sampling control sequence (see the “Material and Methods” section). The maximum simulated values were 16.3% for Alu and 4.9% for satellite repeats. For 10,000 replicates, no Alu replicates (maximum 11.0%) exceed the observed Alu fraction of 14.2% (P<.0001), and no satellite replicates (maximum 1.2%) exceeded the observed satellite fraction of 3.0% (P<.0001).
Figure  3
Figure 3
Specificity of Alu junction enrichment. The average fraction of Alu sequence was computed in 10-bp windows for all 9,464 junctions. Junctions were oriented from external flanking sequence (white) to duplicated sequence (yellow). The X-axis represents base-pair position with respect to the junction point set at 0 (positive values are located internal to the junction, whereas negative values represent extension into flanking sequence). The greatest enrichment occurs specifically at the junction (23.9%) and dissipates within 300 bp (the size of an Alu repeat) on either side of the junction. This effect is asymmetric, with a more gradual bias observed within the duplicated portion of the alignment. Control (blue) and finished genome (gray) averages are shown as bold horizontal lines.
Figure  4
Figure 4
Divergence of junction Alus. A, The sequence divergence of the segmental duplication (X-axis) is compared with the divergence of Alu repeats (Y-axis) for Alu repeats located internal to the pairwise alignment (black triangles) and Alu repeats localized at the junction (red triangles). Kimura’s two-parameter model of genetic distance (in changes/bp) is used as an estimate of divergence excluding CpG dinucleotides. Junction Alu repeats demonstrate an increased divergence relative to internal Alus. B, The pairwise differences in divergence between the junction Alus and each control Alu were calculated for each alignment (KjunctionAlu-KinternalAlu) (see the “Material and Methods” section). The alignment of the full-length Alu repeat element located at the junction, and not simply the Alu portion within the overall genomic alignment, was considered in this analysis. This measure shows a highly skewed positive distribution, with nearly 60% of all pairwise differences demonstrating a significant departure from that of an expected distribution (fig. B [online only]; >1 SD, based on a distribution of the difference between all possible combinations of internal Alu alignments).
Figure  5
Figure 5
Alu subfamily enrichment. The histogram depicts all Alu elements within the genome assembly (build 30), with shades designating their major subfamily, binned on the basis of their estimated ages of insertion (see the “Material and Methods” section). On the basis of this analysis, a significant burst in Alu (AluS) activity is predicted to have occurred 35–40 mya, consistent with results of previous studies (Shen et al. ; Kapitonov and Jurka ; Batzer and Deininger 2002). A generally accepted primate phylogeny (Goodman 1999) is superimposed with the estimated evolutionary age of the major primate Alu subfamilies. On the basis of neutral rates of evolution, the duplications and/or gene conversion events are estimated to have occurred <40 million years ago. A comparison of Alu subfamily and segmental duplication junctions shows that the AluS subfamily is responsible for the vast majority of the overall enrichment. When the enrichment is broken down in terms of younger (90%–95% identity) and older (95%–100% identity) duplications, the relative enrichment in younger duplications increases for AluY and decreases for AluS. This is consistent with the idea that the degree of sequence homology may play a role in mobilizing segmental duplications.
Figure  6
Figure 6
A model of Alu-Alu–mediated duplication. A burst of AluS activity provided hundreds of thousands of sites of near-perfect sequence identity scattered throughout the ancestral genome during a narrow window of anthropoid evolution (35–40 mya). The probability of nonallelic homologous recombination among Alu repeats would have been the greatest during this time period. Three hypothetical scenarios for such Alu-Alu–mediated rearrangements are depicted, including an episomal circle, a linear DNA fragment, and the misalignment of chromosomes during meiosis. Chromosomal misalignment would predict local tandem duplications and deletions. Such events have been well-documented in association with human genetic disease (Kolomietz et al. 2002). Episomal integration would result in duplications flanked at both ends by Alu repeats, with possible rearrangement of the duplicatively transposed sequence, as proposed elsewhere (Eichler et al. 1996). Integration of linear DNA fragments would have the potential to show a wide range of junction properties, since multiple mechanisms could be envisioned to resolve the second junction, such as further Alu-Alu–mediated recombination or nonhomologous end-joining (NHEJ), as shown. All three events would generate “mosaic” or “hybrid” Alu repeat sequences consisting of both donor and acceptor sequences.
Figure  A
Figure A
Distribution of 2,366 alignments analyzed. The positions of interchromosomal (red) and intrachromosomal (blue) duplications are shown. Alignments are distributed across all chromosomes. Purple regions represent unsequenced acrocentric p-arm and centromeric sequence.
Figure  B
Figure B
Expected variation between Alu alignment divergences. To assess the expected divergence between any two Alu alignments within the analyzed 2,366 duplications, we computed all pairwise differences for the internal (control) Alus within a segmental duplication alignment (KinternalAlu1-KinternalAlu2) (see the “Material and Methods” section). A total of 5,306 comparisons were analyzed. The SD was 0.031.
Figure  C
Figure C
Examples of “mosaic” Alus traversing duplication junctions. Pairwise alignments are centered at the junction point between the flanking (white) and duplicated (yellow) sequence. The traversing Alu elements (red) show increasing sequence divergence as the alignment approaches the flanking sequence.
Figure  C
Figure C
Examples of “mosaic” Alus traversing duplication junctions. Pairwise alignments are centered at the junction point between the flanking (white) and duplicated (yellow) sequence. The traversing Alu elements (red) show increasing sequence divergence as the alignment approaches the flanking sequence.
Figure  C
Figure C
Examples of “mosaic” Alus traversing duplication junctions. Pairwise alignments are centered at the junction point between the flanking (white) and duplicated (yellow) sequence. The traversing Alu elements (red) show increasing sequence divergence as the alignment approaches the flanking sequence.
Figure  C
Figure C
Examples of “mosaic” Alus traversing duplication junctions. Pairwise alignments are centered at the junction point between the flanking (white) and duplicated (yellow) sequence. The traversing Alu elements (red) show increasing sequence divergence as the alignment approaches the flanking sequence.
Figure  C
Figure C
Examples of “mosaic” Alus traversing duplication junctions. Pairwise alignments are centered at the junction point between the flanking (white) and duplicated (yellow) sequence. The traversing Alu elements (red) show increasing sequence divergence as the alignment approaches the flanking sequence.

Similar articles

See all similar articles

Cited by 172 articles

See all "Cited by" articles

Publication types

LinkOut - more resources

Feedback