REPdenovo: Inferring De Novo Repeat Motifs from Short Sequence Reads
- PMID: 26977803
- PMCID: PMC4792456
- DOI: 10.1371/journal.pone.0150719
REPdenovo: Inferring De Novo Repeat Motifs from Short Sequence Reads
Abstract
Repeat elements are important components of eukaryotic genomes. One limitation in our understanding of repeat elements is that most analyses rely on reference genomes that are incomplete and often contain missing data in highly repetitive regions that are difficult to assemble. To overcome this problem we develop a new method, REPdenovo, which assembles repeat sequences directly from raw shotgun sequencing data. REPdenovo can construct various types of repeats that are highly repetitive and have low sequence divergence within copies. We show that REPdenovo is substantially better than existing methods both in terms of the number and the completeness of the repeat sequences that it recovers. The key advantage of REPdenovo is that it can reconstruct long repeats from sequence reads. We apply the method to human data and discover a number of potentially new repeats sequences that have been missed by previous repeat annotations. Many of these sequences are incorporated into various parasite genomes, possibly because the filtering process for host DNA involved in the sequencing of the parasite genomes failed to exclude the host derived repeat sequences. REPdenovo is a new powerful computational tool for annotating genomes and for addressing questions regarding the evolution of repeat families. The software tool, REPdenovo, is available for download at https://github.com/Reedwarbler/REPdenovo.
Conflict of interest statement
Figures
Similar articles
-
An improved approach for reconstructing consensus repeats from short sequence reads.BMC Genomics. 2018 Aug 13;19(Suppl 6):566. doi: 10.1186/s12864-018-4920-6. BMC Genomics. 2018. PMID: 30367582 Free PMC article.
-
GAPPadder: a sensitive approach for closing gaps on draft genomes with short sequence reads.BMC Genomics. 2019 Jun 6;20(Suppl 5):426. doi: 10.1186/s12864-019-5703-4. BMC Genomics. 2019. PMID: 31167639 Free PMC article.
-
RepAHR: an improved approach for de novo repeat identification by assembly of the high-frequency reads.BMC Bioinformatics. 2020 Oct 19;21(1):463. doi: 10.1186/s12859-020-03779-w. BMC Bioinformatics. 2020. PMID: 33076827 Free PMC article.
-
Repeat DNA in genome organization and stability.Curr Opin Genet Dev. 2015 Apr;31:12-9. doi: 10.1016/j.gde.2015.03.009. Epub 2015 Apr 29. Curr Opin Genet Dev. 2015. PMID: 25917896 Review.
-
Repetitive sequences in complex genomes: structure and evolution.Annu Rev Genomics Hum Genet. 2007;8:241-59. doi: 10.1146/annurev.genom.8.080706.092416. Annu Rev Genomics Hum Genet. 2007. PMID: 17506661 Review.
Cited by
-
Sequencing and Functional Annotation of the Whole Genome of Shiraia bambusicola.G3 (Bethesda). 2020 Jan 7;10(1):23-35. doi: 10.1534/g3.119.400694. G3 (Bethesda). 2020. PMID: 31712259 Free PMC article.
-
BigFiRSt: A Software Program Using Big Data Technique for Mining Simple Sequence Repeats From Large-Scale Sequencing Data.Front Big Data. 2022 Jan 18;4:727216. doi: 10.3389/fdata.2021.727216. eCollection 2021. Front Big Data. 2022. PMID: 35118375 Free PMC article.
-
Chromosome-Level Assembly of Drosophila bifasciata Reveals Important Karyotypic Transition of the X Chromosome.G3 (Bethesda). 2020 Mar 5;10(3):891-897. doi: 10.1534/g3.119.400922. G3 (Bethesda). 2020. PMID: 31969429 Free PMC article.
-
msRepDB: a comprehensive repetitive sequence database of over 80 000 species.Nucleic Acids Res. 2022 Jan 7;50(D1):D236-D245. doi: 10.1093/nar/gkab1089. Nucleic Acids Res. 2022. PMID: 34850956 Free PMC article.
-
Using bioinformatic and phylogenetic approaches to classify transposable elements and understand their complex evolutionary histories.Mob DNA. 2017 Dec 6;8:19. doi: 10.1186/s13100-017-0103-2. eCollection 2017. Mob DNA. 2017. PMID: 29225705 Free PMC article. Review.
References
Publication types
MeSH terms
Grants and funding
LinkOut - more resources
Full Text Sources
Other Literature Sources
