Three-stage quality control strategies for DNA re-sequencing data

Brief Bioinform. 2014 Nov;15(6):879-89. doi: 10.1093/bib/bbt069. Epub 2013 Sep 24.

Abstract

Advances in next-generation sequencing (NGS) technologies have greatly improved our ability to detect genomic variants for biomedical research. In particular, NGS technologies have been recently applied with great success to the discovery of mutations associated with the growth of various tumours and in rare Mendelian diseases. The advance in NGS technologies has also created significant challenges in bioinformatics. One of the major challenges is quality control of the sequencing data. In this review, we discuss the proper quality control procedures and parameters for Illumina technology-based human DNA re-sequencing at three different stages of sequencing: raw data, alignment and variant calling. Monitoring quality control metrics at each of the three stages of NGS data provides unique and independent evaluations of data quality from differing perspectives. Properly conducting quality control protocols at all three stages and correctly interpreting the quality control results are crucial to ensure a successful and meaningful study.

Keywords: FASTQ; alignment; quality control; sequencing; variant calling.

Publication types

  • Review

MeSH terms

  • Computational Biology / standards
  • DNA / genetics
  • Gene Library
  • High-Throughput Nucleotide Sequencing / standards*
  • High-Throughput Nucleotide Sequencing / statistics & numerical data
  • Humans
  • Neoplasms / genetics
  • Polymorphism, Single Nucleotide
  • Quality Control
  • Sequence Alignment / standards
  • Sequence Alignment / statistics & numerical data
  • Sequence Analysis, DNA / standards*
  • Sequence Analysis, DNA / statistics & numerical data

Substances

  • DNA