Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
. 2016 Mar 15;11(3):e0150719.
doi: 10.1371/journal.pone.0150719. eCollection 2016.

REPdenovo: Inferring De Novo Repeat Motifs from Short Sequence Reads

Affiliations

REPdenovo: Inferring De Novo Repeat Motifs from Short Sequence Reads

Chong Chu et al. PLoS One. .

Abstract

Repeat elements are important components of eukaryotic genomes. One limitation in our understanding of repeat elements is that most analyses rely on reference genomes that are incomplete and often contain missing data in highly repetitive regions that are difficult to assemble. To overcome this problem we develop a new method, REPdenovo, which assembles repeat sequences directly from raw shotgun sequencing data. REPdenovo can construct various types of repeats that are highly repetitive and have low sequence divergence within copies. We show that REPdenovo is substantially better than existing methods both in terms of the number and the completeness of the repeat sequences that it recovers. The key advantage of REPdenovo is that it can reconstruct long repeats from sequence reads. We apply the method to human data and discover a number of potentially new repeats sequences that have been missed by previous repeat annotations. Many of these sequences are incorporated into various parasite genomes, possibly because the filtering process for host DNA involved in the sequencing of the parasite genomes failed to exclude the host derived repeat sequences. REPdenovo is a new powerful computational tool for annotating genomes and for addressing questions regarding the evolution of repeat families. The software tool, REPdenovo, is available for download at https://github.com/Reedwarbler/REPdenovo.

PubMed Disclaimer

Conflict of interest statement

Competing Interests: The authors have declared that no competing interests exist.

Figures

Fig 1
Fig 1. Illustration of the k-mer counting.
Long sequence: a sequence read. Length-3 sequence: k-mer. Here, k = 3. The table on the right shows the k-mer counting result.
Fig 2
Fig 2. Illustration of the main steps of REPdenovo.
Thick bars: genomic sequences. Thin bars: k-mers. K-mer counting step: yellow parts are repeats (with some mismatches). Colored squares within thick bars: mutations (substitutions and indels) within repeats.
Fig 3
Fig 3. Distribution of repeat matching lengths relative to their total length for fK = 10 and fK = 100.
Solid bars: repeats mappable to the reference genome. Bars with patterns: repeats unmapped to reference and having NCBI Blastn hits. The figure shows the relative matching length as the mapping ratio (0%-100%), which is the ratio between the length of mapped part and total length of the repeat. A majority of constructed repeats can match fully to the reference genome or have NCBI Blastn hits.
Fig 4
Fig 4. Hits of Repbase repeats found by REPdenovo.
X axis: divergence rate (mismatches per 1,000 bases) of repeats given by Repbase. Y axis: number of copies from the UCSC genome browser annotation. Dots: Repbase repeats. Red dots: hits found by REPdenovo. Blue dots: repeats not found by REPdenovo.
Fig 5
Fig 5. Assembled repeats matching AluYd3 (a Repbase repeat) by REPdenovo (bottom panel) and RepARK (top panel).
The matched assembled repeats are shown on their mapped positions where the AluYd3 consensus repeat sequence serves as the reference.

Similar articles

Cited by

References

    1. Batzer MA, Deininger PL. Alu repeats and human genomic diversity. Nature Review Genetics. 2002;3:370–379. 10.1038/nrg798 - DOI - PubMed
    1. Kazazian Haig H. Mobile Elements: Drivers of Genome Evolution. Science. 2004;303:1626–1632. 10.1126/science.1089670 - DOI - PubMed
    1. Cordaux R, Batzer MA. The impact of retrotransposons on human genome evolution. Nature Review Genetics. 2009;10:691–703. 10.1038/nrg2640 - DOI - PMC - PubMed
    1. SanMiguel P, Tikhonov A, Jin Y K et al. Nested retrotransposons in the intergenic regions of the maize genome. Science. 1996;274:765–768. 10.1126/science.274.5288.765 - DOI - PubMed
    1. Kazazian HH, Moran JV. The impact of L1 retrotransposons on the human genome. Nat Genet. 1998;19:19–24. 10.1038/ng0598-19 - DOI - PubMed

Publication types

Grants and funding

This work was supported by grants IIS-0953563 and IIS-1447711 from US National Science Foundation (http://www.nsf.gov/) to YW and grant IIS-1526415 from US National Science Foundation to YW and RN. The funders had no role in study design, data collection and analysis, decision to publish, or preparation of the manuscript.