Atropos: specific, sensitive, and speedy trimming of sequencing reads
- PMID: 28875074
- PMCID: PMC5581536
- DOI: 10.7717/peerj.3720
Atropos: specific, sensitive, and speedy trimming of sequencing reads
Abstract
A key step in the transformation of raw sequencing reads into biological insights is the trimming of adapter sequences and low-quality bases. Read trimming has been shown to increase the quality and reliability while decreasing the computational requirements of downstream analyses. Many read trimming software tools are available; however, no tool simultaneously provides the accuracy, computational efficiency, and feature set required to handle the types and volumes of data generated in modern sequencing-based experiments. Here we introduce Atropos and show that it trims reads with high sensitivity and specificity while maintaining leading-edge speed. Compared to other state-of-the-art read trimming tools, Atropos achieves significant increases in trimming accuracy while remaining competitive in execution times. Furthermore, Atropos maintains high accuracy even when trimming data with elevated rates of sequencing errors. The accuracy, high performance, and broad feature set offered by Atropos makes it an appropriate choice for the pre-processing of Illumina, ABI SOLiD, and other current-generation short-read sequencing datasets. Atropos is open source and free software written in Python (3.3+) and available at https://github.com/jdidion/atropos.
Keywords: Adapter; Cutadapt; Illumina; NGS; Preprocessing; Read; Sequencing; Trimming.
Conflict of interest statement
The authors declare that they have no competing interests.
Figures
Similar articles
-
Software for pre-processing Illumina next-generation sequencing short read sequences.Source Code Biol Med. 2014 May 3;9:8. doi: 10.1186/1751-0473-9-8. eCollection 2014. Source Code Biol Med. 2014. PMID: 24955109 Free PMC article.
-
SeqPurge: highly-sensitive adapter trimming for paired-end NGS data.BMC Bioinformatics. 2016 May 10;17:208. doi: 10.1186/s12859-016-1069-7. BMC Bioinformatics. 2016. PMID: 27161244 Free PMC article.
-
Read trimming has minimal effect on bacterial SNP-calling accuracy.Microb Genom. 2020 Dec;6(12):mgen000434. doi: 10.1099/mgen.0.000434. Epub 2020 Dec 11. Microb Genom. 2020. PMID: 33332257 Free PMC article.
-
PathoQC: Computationally Efficient Read Preprocessing and Quality Control for High-Throughput Sequencing Data Sets.Cancer Inform. 2015 May 12;13(Suppl 1):167-76. doi: 10.4137/CIN.S13890. eCollection 2014. Cancer Inform. 2015. PMID: 25983538 Free PMC article. Review.
-
Computational characterisation of cancer molecular profiles derived using next generation sequencing.Contemp Oncol (Pozn). 2015;19(1A):A78-91. doi: 10.5114/wo.2014.47137. Contemp Oncol (Pozn). 2015. PMID: 25691827 Free PMC article. Review.
Cited by
-
Perturbation and resilience of the gut microbiome up to 3 months after β-lactams exposure in healthy volunteers suggest an important role of microbial β-lactamases.Microbiome. 2024 Mar 12;12(1):50. doi: 10.1186/s40168-023-01746-0. Microbiome. 2024. PMID: 38468305 Free PMC article.
-
Ehf and Fezf2 regulate late medullary thymic epithelial cell and thymic tuft cell development.Front Immunol. 2024 Feb 14;14:1277365. doi: 10.3389/fimmu.2023.1277365. eCollection 2023. Front Immunol. 2024. PMID: 38420512 Free PMC article.
-
WNT signalling control by KDM5C during development affects cognition.Nature. 2024 Mar;627(8004):594-603. doi: 10.1038/s41586-024-07067-y. Epub 2024 Feb 21. Nature. 2024. PMID: 38383780 Free PMC article.
-
Investigating the Evolution of Drosophila STING-dependent Antiviral Innate Immunity by Multispecies Comparison of 2'3'-cGAMP Responses.Mol Biol Evol. 2024 Feb 20;41(3):msae032. doi: 10.1093/molbev/msae032. Online ahead of print. Mol Biol Evol. 2024. PMID: 38377349 Free PMC article.
-
A conserved interdomain microbial network underpins cadaver decomposition despite environmental variables.Nat Microbiol. 2024 Mar;9(3):595-613. doi: 10.1038/s41564-023-01580-y. Epub 2024 Feb 12. Nat Microbiol. 2024. PMID: 38347104 Free PMC article.
References
-
- Andrews S. FastQC: a quality control tool for high throughput sequence data. Version: 0.11.5http://www.bioinformatics.babraham.ac.uk/projects/fastqc 2010
-
- Boettiger C. An introduction to docker for reproducible research, with examples from the R environment. ACM SIGOPS Operating Systems Review. 2015;49(1):71–79. doi: 10.1145/2723872.272388. - DOI
-
- DePristo MA, Banks E, Poplin R, Garimella KV, Maguire JR, Hartl C, Philippakis AA, del Angel G, Rivas MA, Hanna M, McKenna A, Fennell TJ, Kernytsky AM, Sivachenko AY, Cibulskis K, Gabriel SB, Altshuler D, Daly MJ. A framework for variation discovery and genotyping using next-generation DNA sequencing data. Nature Genetics. 2011;43(5):491–498. doi: 10.1038/ng.806. - DOI - PMC - PubMed
Grants and funding
LinkOut - more resources
Full Text Sources
Other Literature Sources
