Skip to main page content
Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
. 2004 Apr;14(4):493-506.
doi: 10.1101/gr.1907504.

Recent Segmental Duplications in the Working Draft Assembly of the Brown Norway Rat

Affiliations
Free PMC article

Recent Segmental Duplications in the Working Draft Assembly of the Brown Norway Rat

Eray Tuzun et al. Genome Res. .
Free PMC article

Abstract

We assessed the content, structure, and distribution of segmental duplications (> or =90% sequence identity, > or =5 kb length) within the published version of the Rattus norvegicus genome assembly (v.3.1). The overall fraction of duplicated sequence within the rat assembly (2.92%) is greater than that of the mouse (1%-1.2%) but significantly less than that of human ( approximately 5%). Duplications were nonuniformly distributed, occurring predominantly as tandem and tightly clustered intrachromosomal duplications. Regions containing extensive interchromosomal duplications were observed, particularly within subtelomeric and pericentromeric regions. We identified 41 discrete genomic regions greater than 1 Mb in size, termed "duplication blocks." These appear to have been the target of extensive duplication over millions of years of evolution. Gene content within duplicated regions ( approximately 1%) was lower than expected based on the genome representation. Interestingly, sequence contigs lacking chromosome assignment ("the unplaced chromosome") showed a marked enrichment for segmental duplication (45% of 75.2 Mb), indicating that segmental duplications have been problematic for sequence and assembly of the rat genome. Further targeted efforts are required to resolve the organization and complexity of these regions.

Figures

Figure 1
Figure 1
Duplicated fraction in the rat genome. The figure depicts the proportion of the genome that shows duplication (A) when all genomic sequence was compared, and (B) for the rat genome excluding random, unassigned sequence contigs. Various lengths and % identity thresholds are shown. A very small portion of the rat genome shows segmental duplications with ≥99.5% sequence identity. This suggests that the majority of segmental duplications are bona fide and are not the result of missed allelic overlaps during genome assembly.
Figure 1
Figure 1
Duplicated fraction in the rat genome. The figure depicts the proportion of the genome that shows duplication (A) when all genomic sequence was compared, and (B) for the rat genome excluding random, unassigned sequence contigs. Various lengths and % identity thresholds are shown. A very small portion of the rat genome shows segmental duplications with ≥99.5% sequence identity. This suggests that the majority of segmental duplications are bona fide and are not the result of missed allelic overlaps during genome assembly.
Figure 2
Figure 2
Sequence properties of rat segmental duplications. Distributions of the (A) length and (B) percent nucleotide sequence identity for segmental duplications are shown as a function of the number of aligned bp. Interchromosomal duplications (red); intrachromosomal duplications (blue).
Figure 2
Figure 2
Sequence properties of rat segmental duplications. Distributions of the (A) length and (B) percent nucleotide sequence identity for segmental duplications are shown as a function of the number of aligned bp. Interchromosomal duplications (red); intrachromosomal duplications (blue).
Figure 3
Figure 3
Distribution of segmental duplications (≥90% and ≥10 kb) in the rat genome. The pattern of (A) interchromosomal duplications (red) and (B) intrachromosomal duplications (blue) are depicted for all duplications ≥90% sequence identity and ≥10 kb in length. For clarity, interchromosomal distribution patterns with the random, unassigned sequence contigs (chrUn) are not shown for (A). For more detail, including % identity and pairwise relationships of all duplications and alignments, see http://ratparalogy.cwru.edu.
Figure 3
Figure 3
Distribution of segmental duplications (≥90% and ≥10 kb) in the rat genome. The pattern of (A) interchromosomal duplications (red) and (B) intrachromosomal duplications (blue) are depicted for all duplications ≥90% sequence identity and ≥10 kb in length. For clarity, interchromosomal distribution patterns with the random, unassigned sequence contigs (chrUn) are not shown for (A). For more detail, including % identity and pairwise relationships of all duplications and alignments, see http://ratparalogy.cwru.edu.
Figure 4
Figure 4
(A) Segmental duplication content per chromosome. The relative proportion of intrachromosomal and interchromosomal duplications for each chromosome is shown. The above calculations treat the unmapped sequence as a separate chromosome when classifying duplications as inter- or intrachromosomal. Forty-five percent of the unplaced chromosome is made up almost entirely of duplicated sequence. (B) Duplication blocks. Rat segmental duplications clustered into larger regions ranging from 100 to 3000 kb in length. We termed these structures “duplication blocks.” Examples of duplication blocks on chromosomes 1 and 7 are presented (arrows) with the underlying degree of sequence identity for each pairwise depicted below the graph. Chromosome 1, green; chromosome 7, red. A subtelomeric (t) and pericentromeric (p) block are indicated. The regions of the rat genome are typified by low gene density (RefSeq/EST/mRNA), a high frequency of gaps within the assembly, and an excess of pairwise alignments.
Figure 4
Figure 4
(A) Segmental duplication content per chromosome. The relative proportion of intrachromosomal and interchromosomal duplications for each chromosome is shown. The above calculations treat the unmapped sequence as a separate chromosome when classifying duplications as inter- or intrachromosomal. Forty-five percent of the unplaced chromosome is made up almost entirely of duplicated sequence. (B) Duplication blocks. Rat segmental duplications clustered into larger regions ranging from 100 to 3000 kb in length. We termed these structures “duplication blocks.” Examples of duplication blocks on chromosomes 1 and 7 are presented (arrows) with the underlying degree of sequence identity for each pairwise depicted below the graph. Chromosome 1, green; chromosome 7, red. A subtelomeric (t) and pericentromeric (p) block are indicated. The regions of the rat genome are typified by low gene density (RefSeq/EST/mRNA), a high frequency of gaps within the assembly, and an excess of pairwise alignments.

Similar articles

See all similar articles

Cited by 35 articles

See all "Cited by" articles

Publication types

LinkOut - more resources

Feedback