Skip to main page content
Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
Review
, 72 (4), 686-727

Comparative Genomics and Molecular Dynamics of DNA Repeats in Eukaryotes

Affiliations
Review

Comparative Genomics and Molecular Dynamics of DNA Repeats in Eukaryotes

Guy-Franck Richard et al. Microbiol Mol Biol Rev.

Abstract

Repeated elements can be widely abundant in eukaryotic genomes, composing more than 50% of the human genome, for example. It is possible to classify repeated sequences into two large families, "tandem repeats" and "dispersed repeats." Each of these two families can be itself divided into subfamilies. Dispersed repeats contain transposons, tRNA genes, and gene paralogues, whereas tandem repeats contain gene tandems, ribosomal DNA repeat arrays, and satellite DNA, itself subdivided into satellites, minisatellites, and microsatellites. Remarkably, the molecular mechanisms that create and propagate dispersed and tandem repeats are specific to each class and usually do not overlap. In the present review, we have chosen in the first section to describe the nature and distribution of dispersed and tandem repeats in eukaryotic genomes in the light of complete (or nearly complete) available genome sequences. In the second part, we focus on the molecular mechanisms responsible for the fast evolution of two specific classes of tandem repeats: minisatellites and microsatellites. Given that a growing number of human neurological disorders involve the expansion of a particular class of microsatellites, called trinucleotide repeats, a large part of the recent experimental work on microsatellites has focused on these particular repeats, and thus we also review the current knowledge in this area. Finally, we propose a unified definition for mini- and microsatellites that takes into account their biological properties and try to point out new directions that should be explored in a near future on our road to understanding the genetics of repeated sequences.

Figures

FIG. 1.
FIG. 1.
Repeated DNA sequences in eukaryotic genomes and mechanisms of evolution. The two main categories of repeated elements (tandem repeats and dispersed repeats) are shown, along with subcategories, as described in the text. Blue arrows point to molecular mechanisms that are involved in propagation and evolution of repeated sequences. REP, replication slippage; GCO, gene conversion; WGD, whole-genome duplication; SEG, segmental duplications; RTR, reverse transcription; TRA, transposition.
FIG. 2.
FIG. 2.
Motif sizes, lengths, and abundances of satellite sequences in eukaryotes. For each category (satellites, minisatellites, and microsatellites), the distribution of motif sizes, total lengths of repeat arrays, and numbers of occurences of each repeat category per eukaryotic genome are shown on a logarithmic scale. Satellite DNA can extend over megabases of DNA but its maximum length is unknown, due to the lack of sequence information (dotted lines and question mark).
FIG. 3.
FIG. 3.
Number of citations per year in the PubMed database for different search terms.
FIG. 4.
FIG. 4.
The “replication slippage” model of tandem repeat instability. The template strands are drawn in red, and the newly synthesized strands are drawn in blue. During replication of a repeat-containing sequence (A), the replication machinery may pause on the lagging strand, due to secondary structures or other kinds of lesions (B). (C) Partial unwinding of the lagging strand may lead to replication slippage when replication restarts, giving rise to an expansion or a contraction of the repeat tract, depending on what strand (template or newly synthesized strand) slippage occurred. (D) Alternatively, partial unwinding of the lagging strand may lead to lesion bypass by homologous recombination with the sister chromatid, also leading to contractions or expansions of the repeat tract (Fig. 7).
FIG. 5.
FIG. 5.
Secondary structures formed by some trinucleotide repeats. (A) CAG, CTG, and CCG hairpins formed by an odd number of repeat units. Bases making no pairing within the stem are colored. (B) CAG, CTG, and CCG hairpins formed by an even number of repeat units. Bases making no pairing within the stem are colored. (C) Triple helice formed by (GAA)n repeats. Watson-Crick pairings are shown by double lines, and Hogsteen pairings are shown by single lines (314). (D) Tetraplex structure formed by (CCG)n repeats. Cytosines and cytosine bonds are shown in red.
FIG. 6.
FIG. 6.
Effect of mutants of the replication fork on microsatellite instability. The budding yeast replication fork is schematized. The Mcm DNA helicase opens the double helix to allow leading- and lagging-strand synthesis. The DNA Pol α-primase complex is believed to initiate replication by synthesis of a short RNA primer on both DNA strands. It is then replaced by the more processive Pol δ (on lagging stand) or Pol ɛ (on leading strand) associated with PCNA, the clamp processivity factor, loaded by the Rfc complex (“clamp loader”). Okazaki fragments on the lagging strand are subsequently processed by several enzymes, namely, the RNase H complex, Rad27, and Dna2, before ligation with the preceding Okazaki fragment is catalyzed by Cdc9. The mismatch repair complex scans newly synthesized DNA in order to check for replication errors. Single-stranded DNA is covered by the SSB complex Rpa. Note that PCNA also interacts with Cdc9 and Mlh1, in addition to proteins represented here. The trinucleotide repeat orientation represented here corresponds to the CTG orientation (or orientation II), in which CTGs are located on the lagging-strand template. Each box details the increase of microsatellite instability for each of the replication fork mutants tested over the wild-type level. Data were compiled from references given in the text.
FIG. 7.
FIG. 7.
The “DSB repair slippage” model of tandem repeat instability. The broken molecule (recipient) is drawn in blue, the template molecule (donor) is drawn in red, and the newly synthesized strands are drawn in orange. (A) Following a DSB, gene conversion is initiated by strand invasion, forming a “D-loop.” (B) DNA synthesis within the repeat tract may be faithful or associated with slippage. After capture of the second end of the break, DNA synthesis of the second strand may be faithful (C) or associated with slippage (D). Slippage events will lead to expansions of the repeat tract (as shown in panels C and D) or to contractions if slippage occurs on the template strand. Alternatively, after capture of both ends followed by DNA synthesis, the two newly synthesized strands may unwind and anneal with each other in frame or out of frame, leading to expansions or contractions of the repeat tract. (E) This last alternative pathway is adapted from the synthesis-dependent strand annealing mechanism, proposed by several authors to explain tandem repeat rearrangements during gene conversion (363, 378, 420).
FIG. 8.
FIG. 8.
Revisiting the trinucleotide repeat expansion model. Following this model, the fate of a given trinucleotide repeat tract depends only on its size. Due to secondary structures, short repeats are prone to replication slippage during S-phase replication, subsequently repaired by postreplication slippage under the control of the Srs2 protein. This will eventually lead to repeat expansion when Srs2 is deficient. Longer repeats are also prone to slippage but may stall forks, leading to DNA damage and DSBs. Checkpoint deficiency may lead to unrepaired DSBs and fragile site expression. DSB repair under the control of Srs2 may lead to repeat instability by gene conversion. Msh2 would bind to repeat hairpins, stabilizing them and maybe increasing the chance of slippage and/or breakage.

Similar articles

See all similar articles

Cited by 165 PubMed Central articles

See all "Cited by" articles

Publication types

LinkOut - more resources

Feedback