Accelerating metagenomic read classification on CUDA-enabled GPUs

BMC Bioinformatics. 2017 Jan 3;18(1):11. doi: 10.1186/s12859-016-1434-6.

Abstract

Background: Metagenomic sequencing studies are becoming increasingly popular with prominent examples including the sequencing of human microbiomes and diverse environments. A fundamental computational problem in this context is read classification; i.e. the assignment of each read to a taxonomic label. Due to the large number of reads produced by modern high-throughput sequencing technologies and the rapidly increasing number of available reference genomes software tools for fast and accurate metagenomic read classification are urgently needed.

Results: We present cuCLARK, a read-level classifier for CUDA-enabled GPUs, based on the fast and accurate classification of metagenomic sequences using reduced k-mers (CLARK) method. Using the processing power of a single Titan X GPU, cuCLARK can reach classification speeds of up to 50 million reads per minute. Corresponding speedups for species- (genus-)level classification range between 3.2 and 6.6 (3.7 and 6.4) compared to multi-threaded CLARK executed on a 16-core Xeon CPU workstation.

Conclusion: cuCLARK can perform metagenomic read classification at superior speeds on CUDA-enabled GPUs. It is free software licensed under GPL and can be downloaded at https://github.com/funatiq/cuclark free of charge.

Keywords: CUDA; Exact k-mer matching; GPUs; Metagenomics; Taxonomic assignment.

MeSH terms

  • High-Throughput Nucleotide Sequencing
  • Humans
  • Internet
  • Metagenomics*
  • Sequence Analysis, DNA
  • User-Computer Interface*