Background: The implementation of high throughput sequencing for exploring biodiversity poses high demands on bioinformatics applications for automated data processing. Here we introduce CLOTU, an online and open access pipeline for processing 454 amplicon reads. CLOTU has been constructed to be highly user-friendly and flexible, since different types of analyses are needed for different datasets.
Results: In CLOTU, the user can filter out low quality sequences, trim tags, primers, adaptors, perform clustering of sequence reads, and run BLAST against NCBInr or a customized database in a high performance computing environment. The resulting data may be browsed in a user-friendly manner and easily forwarded to downstream analyses. Although CLOTU is specifically designed for analyzing 454 amplicon reads, other types of DNA sequence data can also be processed. A fungal ITS sequence dataset generated by 454 sequencing of environmental samples is used to demonstrate the utility of CLOTU.
Conclusions: CLOTU is a flexible and easy to use bioinformatics pipeline that includes different options for filtering, trimming, clustering and taxonomic annotation of high throughput sequence reads. Some of these options are not included in comparable pipelines. CLOTU is implemented in a Linux computer cluster and is freely accessible to academic users through the Bioportal web-based bioinformatics service (http://www.bioportal.uio.no).