AfterQC: automatic filtering, trimming, error removing and quality control for fastq data
- PMID: 28361673
- PMCID: PMC5374548
- DOI: 10.1186/s12859-017-1469-3
AfterQC: automatic filtering, trimming, error removing and quality control for fastq data
Abstract
Background: Some applications, especially those clinical applications requiring high accuracy of sequencing data, usually have to face the troubles caused by unavoidable sequencing errors. Several tools have been proposed to profile the sequencing quality, but few of them can quantify or correct the sequencing errors. This unmet requirement motivated us to develop AfterQC, a tool with functions to profile sequencing errors and correct most of them, plus highly automated quality control and data filtering features. Different from most tools, AfterQC analyses the overlapping of paired sequences for pair-end sequencing data. Based on overlapping analysis, AfterQC can detect and cut adapters, and furthermore it gives a novel function to correct wrong bases in the overlapping regions. Another new feature is to detect and visualise sequencing bubbles, which can be commonly found on the flowcell lanes and may raise sequencing errors. Besides normal per cycle quality and base content plotting, AfterQC also provides features like polyX (a long sub-sequence of a same base X) filtering, automatic trimming and K-MER based strand bias profiling.
Results: For each single or pair of FastQ files, AfterQC filters out bad reads, detects and eliminates sequencer's bubble effects, trims reads at front and tail, detects the sequencing errors and corrects part of them, and finally outputs clean data and generates HTML reports with interactive figures. AfterQC can run in batch mode with multiprocess support, it can run with a single FastQ file, a single pair of FastQ files (for pair-end sequencing), or a folder for all included FastQ files to be processed automatically. Based on overlapping analysis, AfterQC can estimate the sequencing error rate and profile the error transform distribution. The results of our error profiling tests show that the error distribution is highly platform dependent.
Conclusion: Much more than just another new quality control (QC) tool, AfterQC is able to perform quality control, data filtering, error profiling and base correction automatically. Experimental results show that AfterQC can help to eliminate the sequencing errors for pair-end sequencing data to provide much cleaner outputs, and consequently help to reduce the false-positive variants, especially for the low-frequency somatic mutations. While providing rich configurable options, AfterQC can detect and set all the options automatically and require no argument in most cases.
Keywords: Bubble; Data filtering; NGS; Overlap analysis; Quality control.
Figures
Similar articles
-
ClinQC: a tool for quality control and cleaning of Sanger and NGS data in clinical research.BMC Bioinformatics. 2016 Feb 2;17:56. doi: 10.1186/s12859-016-0915-y. BMC Bioinformatics. 2016. PMID: 26830926 Free PMC article.
-
Gencore: an efficient tool to generate consensus reads for error suppressing and duplicate removing of NGS data.BMC Bioinformatics. 2019 Dec 27;20(Suppl 23):606. doi: 10.1186/s12859-019-3280-9. BMC Bioinformatics. 2019. PMID: 31881822 Free PMC article.
-
Blue: correcting sequencing errors using consensus and context.Bioinformatics. 2014 Oct;30(19):2723-32. doi: 10.1093/bioinformatics/btu368. Epub 2014 Jun 11. Bioinformatics. 2014. PMID: 24919879
-
The Sanger FASTQ file format for sequences with quality scores, and the Solexa/Illumina FASTQ variants.Nucleic Acids Res. 2010 Apr;38(6):1767-71. doi: 10.1093/nar/gkp1137. Epub 2009 Dec 16. Nucleic Acids Res. 2010. PMID: 20015970 Free PMC article. Review.
-
Comprehensive Evaluation of Error-Correction Methodologies for Genome Sequencing Data.In: Helder I. N, editor. Bioinformatics [Internet]. Brisbane (AU): Exon Publications; 2021 Mar 20. Chapter 6. In: Helder I. N, editor. Bioinformatics [Internet]. Brisbane (AU): Exon Publications; 2021 Mar 20. Chapter 6. PMID: 33877761 Free Books & Documents. Review.
Cited by
-
RNAseq analysis of oocyte maturation from the germinal vesicle stage to metaphase II in pig and human.PLoS One. 2024 Aug 9;19(8):e0305893. doi: 10.1371/journal.pone.0305893. eCollection 2024. PLoS One. 2024. PMID: 39121087 Free PMC article.
-
Meta-analysis Driven Strain Design for Mitigating Oxidative Stresses Important in Biomanufacturing.ACS Synth Biol. 2024 Jul 19;13(7):2045-2059. doi: 10.1021/acssynbio.3c00572. Epub 2024 Jun 27. ACS Synth Biol. 2024. PMID: 38934464 Free PMC article.
-
G-Quadruplex Forming DNA Sequence Context Is Enriched around Points of Somatic Mutations in a Subset of Multiple Myeloma Patients.Int J Mol Sci. 2024 May 12;25(10):5269. doi: 10.3390/ijms25105269. Int J Mol Sci. 2024. PMID: 38791307 Free PMC article.
-
The detailed analysis of the microbiome and resistome of artisanal blue-veined cheeses provides evidence on sources and patterns of succession linked with quality and safety traits.Microbiome. 2024 Apr 27;12(1):78. doi: 10.1186/s40168-024-01790-4. Microbiome. 2024. PMID: 38678226 Free PMC article.
-
Complete genome sequence of Chlamydia psittaci АМК-16, isolated from a small ruminant in the Middle Volga Region, Russia.Microbiol Resour Announc. 2024 May 9;13(5):e0054323. doi: 10.1128/mra.00543-23. Epub 2024 Mar 27. Microbiol Resour Announc. 2024. PMID: 38534150 Free PMC article.
References
-
- Quail MA, Smith M, Coupland P, Otto TD, Harris SR, Connor TR, Bertoni A, Swerdlow HP, Gu Y. A tale of three next generation sequencing platforms: comparison of ion torrent, pacific biosciences and illumina miseq sequencers. BMC genomics. 2012;13(1):1. doi: 10.1186/1471-2164-13-341. - DOI - PMC - PubMed
-
- Andrews S. A Quality Control Tool for High Throughput Sequence Data. http://www.bioinformatics.bbsrc.ac.uk/projects/fastqc/. Accessed 7 Dec 2016.
MeSH terms
Substances
LinkOut - more resources
Full Text Sources
Other Literature Sources
Research Materials
Miscellaneous
