Skip to main page content
Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
Review
. 2009 Oct;25(10):443-54.
doi: 10.1016/j.tig.2009.08.002. Epub 2009 Sep 30.

The Origins and Impact of Primate Segmental Duplications

Affiliations
Free PMC article
Review

The Origins and Impact of Primate Segmental Duplications

Tomas Marques-Bonet et al. Trends Genet. .
Free PMC article

Abstract

Duplicated sequences are substrates for the emergence of new genes and are an important source of genetic instability associated with rare and common diseases. Analyses of primate genomes have shown an increase in the proportion of interspersed segmental duplications (SDs) within the genomes of humans and great apes. This contrasts with other mammalian genomes that seem to have their recently duplicated sequences organized in a tandem configuration. In this review, we focus on the mechanistic origin and impact of this difference with respect to evolution, genetic diversity and primate phenotype. Although many genomes will be sequenced in the future, resolution of this aspect of genomic architecture still requires high quality sequences and detailed analyses.

Figures

Figure 1
Figure 1
SDs in different primate species. The proportion of large (>20Kb) and high identity duplications are given for four primate genomes. Estimates were based on identifying regions of excess read-depth (Figure I in Box 1) after copy number correction to avoid the bias of non-human-specific SDs [21]. The genomes of human and chimpanzee show twice the number of duplicated basepairs. This observation was also supported by experimental analysis [9]. FISH analysis from 384 randomly selected BACs in chimpanzee, baboon and marmoset estimated 7.73%, 4.39% and 2.00% of duplications, respectively.
Figure 2
Figure 2
Comparative analysis of disease-associated SDs. The breakpoint regions of genomic loci associated with SDs and human disease were comparatively analyzed among the primates [21]. The evolutionary age of the duplicated basepairs was inferred based on whether human SDs mapping to each region were shared or lineage-specific (i.e. <6 mya for human-specific SDs, 6–12 mya for duplications shared with chimpanzee, 12–25 mya for those shared with orangutan and >25 mya for those shared with macaque). With a few exceptions, the analysis shows that most of the complex duplication architecture that promotes rearrangement has evolved relatively recently (i.e. <12 mya).
Figure 3
Figure 3
A recurrent segmental duplication specific to African great apes. (a) Initial WSSD analysis of the chimpanzee genome predicted two chimpanzee-specific duplications (depicted as block 1 and block 2 in blue). The duplication was confirmed by comparative array-CGH (using the human genome as a reference). Note that probes with log2 ratios above (increased copies) or below (decreased copies) 1.5 standard deviations from the normalized log2 ratio are colored green or red, respectively. Array- CGH analysis revealed that both bonobo and gorilla also carried the duplication. Two genes were predicted to map to the duplicated segments. (b) Fluorescence in situ hybridization showed that the duplications (i.e. blocks 1 and 2) had expanded in copy among all African great apes but not in humans. Interestingly, experimental and computational data suggest that all derivative locations between chimpanzee and gorilla are non-orthologous.
Figure 4
Figure 4
Sequence analysis of human–gibbon breakpoints of synteny reveals potential mechanisms for SD formation. (a) Insertion of a 4.3 Kb sequence at the human–NLE gibbon breakpoint is shown. Note that the 4.3 Kb sequence block at the breakpoint is derived from ~2.5 Kb and 1.8 Kb blocks that originated 72 Kb and 64.5 Kb upstream, respectively. The grey bar denotes gibbon-specific SD, as assessed by WSSD and validated by FISH using fosmid probes. (b) A replication-based model for formation of SDs is shown [78,83,88]. Large gaps are generated by DSBs (because of a possible collapsed or stalled replication forks) at rearrangement sites. Replication is initiated by recurrent strand invasion and replication to repair the gap. Consequent to a series of strand invasion, replication and uncoupling of the replication machinery, the gap is filled by a mosaic of sequence segments.
Figure 5
Figure 5
Comparative duplication architecture of 17q21.31. (a) The schematic shows the extent of duplication for a 1.5 Mb genomic region among human, chimpanzee, orangutan and macaque as determined by WSSD (blue excess read-depth). Dashed lines show the position of the “core” duplicon region corresponding to the LRRC37A gene family. The complexity of the region was not revealed until a complete high quality sequence contig was generated in BAC clones [36]. (b) The inverted sequence organization (grey lines) between two human haplotypes H1 (non-inverted) and H2 (inverted) is shown. Direct (green) and inverted (blue) SDs are depicted for both haplotypes. The H2 haplotype has larger, more identical and directly orientated duplications flanking a suite of neurological genes. It has increased in frequency in the European population presumably as a result of positive selection [85]. The different pattern of duplications in H2 leads to pathogenic microdeletions associated with the 17q21.31 deletion syndrome [17,87,89]. This region clearly highlights the complexity of the duplicated regions and the importance for high quality sequences to understand disease and human evolution. (c) A photograph of a child with cognitive disabilities and developmental delay carrying a 17q21.31 microdeletion is shown. Note the characteristic features including a bulbous nose, and silvery depigmentation of the hair and eyes.
Figure I
Figure I
Strategies for duplication detection (WSSD and WGAC). (a) A schematic representation of the whole-genome shotgun sequence detection (WSSD) method to detect recent duplications. In short, whole-genome shotgun reads are mapped against a reference assembly and duplication is detected by the excess of read-depth. Thresholds for duplication detection are estimated from known single-copy BACs. (b) A whole-genome assembly comparison (WGAC) strategy where the genome is segmented, repeats are extracted and the remaining genome segments are compared to identify high identity pairwise alignments.

Similar articles

See all similar articles

Cited by 65 articles

See all "Cited by" articles

Publication types

Feedback