Comparison of high-throughput sequencing data compression tools

Nat Methods. 2016 Dec;13(12):1005-1008. doi: 10.1038/nmeth.4037. Epub 2016 Oct 24.

Abstract

High-throughput sequencing (HTS) data are commonly stored as raw sequencing reads in FASTQ format or as reads mapped to a reference, in SAM format, both with large memory footprints. Worldwide growth of HTS data has prompted the development of compression methods that aim to significantly reduce HTS data size. Here we report on a benchmarking study of available compression methods on a comprehensive set of HTS data using an automated framework.

MeSH terms

  • Animals
  • Cacao / genetics
  • Computational Biology / methods*
  • Data Compression / methods*
  • Drosophila melanogaster / genetics
  • Escherichia coli / genetics
  • High-Throughput Nucleotide Sequencing / methods*
  • Humans
  • Pseudomonas aeruginosa / genetics