Skip to main page content
Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
, 20 (1), 361

FastqCleaner: An Interactive Bioconductor Application for Quality-Control, Filtering and Trimming of FASTQ Files

Affiliations

FastqCleaner: An Interactive Bioconductor Application for Quality-Control, Filtering and Trimming of FASTQ Files

Leandro Gabriel Roser et al. BMC Bioinformatics.

Abstract

Background: Exploration and processing of FASTQ files are the first steps in state-of-the-art data analysis workflows of Next Generation Sequencing (NGS) platforms. The large amount of data generated by these technologies has put a challenge in terms of rapid analysis and visualization of sequencing information. Recent integration of the R data analysis platform with web visual frameworks has stimulated the development of user-friendly, powerful, and dynamic NGS data analysis applications.

Results: This paper presents FastqCleaner, a Bioconductor visual application for both quality-control (QC) and pre-processing of FASTQ files. The interface shows diagnostic information for the input and output data and allows to select a series of filtering and trimming operations in an interactive framework. FastqCleaner combines the technology of Bioconductor for NGS data analysis with the data visualization advantages of a web environment.

Conclusions: FastqCleaner is an user-friendly, offline-capable tool that enables access to advanced Bioconductor infrastructure. The novel concept of a Bioconductor interactive application that can be used without the need for programming skills, makes FastqCleaner a valuable resource for NGS data analysis.

Keywords: Bioconductor; FASTQ; Next generation sequencing; R; Shiny; User-friendly tool; Visualization; Web app.

Conflict of interest statement

The authors declare that they have no competing interests.

Figures

Fig. 1
Fig. 1
Graphical representation of a typical workflow with FastqCleaner, showing the initial selection of FASTQ file(s), processing, and generation of output(s). Diagnostic interactive plots can be constructed for both input and output files. Circular arrows indicate halfway points in the workflow, where different configurations can be selected to re-run the program from there
Fig. 2
Fig. 2
Examples for adapter trimming. Pictures show the relative position of an adapter and a read, and the expected result after processing with the adapter_filter function of FastqCleaner. Dotted lines indicate the portion of the read that will be removed. Arrows show the direction along the read used for the program to seek for matches. If one or more matches are found, the function trims the longest subsequence, that contains the matching region plus the rest of the read, in the corresponding trimming direction. a partial adapter on the right + right-trimming of anchored adapter. b partial adapter on the left + left-trimming of anchored adapter. c partial adapter within read + right-trimming. D,E: full match between an adapter and a portion of the read + left- (d) or right- (e) trimming. f multiple matches for a same adapter + left-trimming
Fig. 3
Fig. 3
RStudio addins menu, showing the button to launch the FastqCleaner application
Fig. 4
Fig. 4
Web interface of the FastqCleaner application. a first tab, showing an example where a file and a filter are selected. b second tab, showing the processes performed after running the program. c third tab, showing the analysis of the data, in this case for the input FASTQ file. The plot shows the base composition of the sequences. d fourth tab, showing a table with the frequency and the sequence of each duplicated read
Fig. 5
Fig. 5
Bar plot for elapsed time (in seconds) for SR adapter trimming and read length filtering
Fig. 6
Fig. 6
Bar plot for elapsed time (in seconds) for PE adapter trimming and read length filtering. FASTX-Toolkit is not capable to process PE reads and is not shown in the plot

Similar articles

See all similar articles

References

    1. Koboldt D, Steinberg K, Larson D, Wilson R, Mardis E. The next-generation sequencing revolution and its impact on genomics. Cell. 2013;155:27–38. doi: 10.1016/j.cell.2013.09.006. - DOI - PMC - PubMed
    1. Tripathi R, Sharma P, Chakraborty P, Varadwaj P. Next-generation sequencing revolution through big data analytics. Front Life Sci. 2016;9:119–149. doi: 10.1080/21553769.2016.1178180. - DOI
    1. Huber W, Carey V, Gentleman R, Anders S, Carlson M, Carvalho B, Bravo H, Davis S, Gatto L, Girke T, Gottardo R. Orchestrating high-throughput genomic analysis with Bioconductor. Nat Methods. 2015;12:115–121. doi: 10.1038/nmeth.3252. - DOI - PMC - PubMed
    1. R Core Team . R: A Language and Environment for Statistical Computing. Vienna: R Foundation for Statistical Computing; 2017.
    1. Lawrence M, Huber W, Pages H, Aboyoun P, Carlson M, Gentleman R, Morgan M, Carey V. Software for computing and annotating genomic ranges. PLoS Comput Biol. 2013;9:e1003118. doi: 10.1371/journal.pcbi.1003118. - DOI - PMC - PubMed

LinkOut - more resources

Feedback