MEC: Misassembly Error Correction in contigs based on distribution of paired-end reads and statistics of GC-contents
- PMID: 30334805
- DOI: 10.1109/TCBB.2018.2876855
MEC: Misassembly Error Correction in contigs based on distribution of paired-end reads and statistics of GC-contents
Abstract
The de novo assembly tools aim at reconstructing genomes from next-generation sequencing (NGS) data. However, the assembly tools usually generate a large amount of contigs containing many misassemblies, which are caused by problems of repetitive regions, chimeric reads and sequencing errors. As they can improve the accuracy of assembly results, detecting and correcting the misassemblies in contigs are appealing, yet challenging. In this study, a novel method, called MEC, is proposed to identify and correct misassemblies in contigs. Based on the insert size distribution of paired-end reads and the statistical analysis of GC-contents, MEC can identify more misassemblies accurately. We evaluate our MEC with the metrics (NA50, NGA50) on four datasets, compared it with the most available misassembly correction tools, and carry out experiments to analyze the influence of MEC on scaffolding results, which shows that MEC can reduce misassemblies effectively and result in quantitative improvements in scaffolding quality. MEC is publicly available at https://github.com/bioinfomaticsCSU/MEC.
Similar articles
-
PECC: Correcting contigs based on paired-end read distribution.Comput Biol Chem. 2017 Aug;69:178-184. doi: 10.1016/j.compbiolchem.2017.03.012. Epub 2017 May 1. Comput Biol Chem. 2017. PMID: 28545961
-
NxRepair: error correction in de novo sequence assembly using Nextera mate pairs.PeerJ. 2015 Jun 2;3:e996. doi: 10.7717/peerj.996. eCollection 2015. PeerJ. 2015. PMID: 26056623 Free PMC article.
-
Tigmint: correcting assembly errors using linked reads from large molecules.BMC Bioinformatics. 2018 Oct 26;19(1):393. doi: 10.1186/s12859-018-2425-6. BMC Bioinformatics. 2018. PMID: 30367597 Free PMC article.
-
ReMILO: reference assisted misassembly detection algorithm using short and long reads.Bioinformatics. 2018 Jan 1;34(1):24-32. doi: 10.1093/bioinformatics/btx524. Bioinformatics. 2018. PMID: 28961789
-
Genome sequence assembly algorithms and misassembly identification methods.Mol Biol Rep. 2022 Nov;49(11):11133-11148. doi: 10.1007/s11033-022-07919-8. Epub 2022 Sep 23. Mol Biol Rep. 2022. PMID: 36151399 Review.
Cited by
-
Endoparasitoid lifestyle promotes endogenization and domestication of dsDNA viruses.Elife. 2023 Jun 6;12:e85993. doi: 10.7554/eLife.85993. Elife. 2023. PMID: 37278068 Free PMC article.
-
A Sequence-Based Novel Approach for Quality Evaluation of Third-Generation Sequencing Reads.Genes (Basel). 2019 Jan 14;10(1):44. doi: 10.3390/genes10010044. Genes (Basel). 2019. PMID: 30646604 Free PMC article.
-
metaMIC: reference-free misassembly identification and correction of de novo metagenomic assemblies.Genome Biol. 2022 Nov 14;23(1):242. doi: 10.1186/s13059-022-02810-y. Genome Biol. 2022. PMID: 36376928 Free PMC article.
-
Whole genome assembly and annotation of the endangered Caribbean coral Acropora cervicornis.G3 (Bethesda). 2023 Dec 6;13(12):jkad232. doi: 10.1093/g3journal/jkad232. G3 (Bethesda). 2023. PMID: 37804092 Free PMC article.
-
RepAHR: an improved approach for de novo repeat identification by assembly of the high-frequency reads.BMC Bioinformatics. 2020 Oct 19;21(1):463. doi: 10.1186/s12859-020-03779-w. BMC Bioinformatics. 2020. PMID: 33076827 Free PMC article.
LinkOut - more resources
Full Text Sources
Research Materials
Miscellaneous
