Common contaminants in next-generation sequencing that hinder discovery of low-abundance microbes

PLoS One. 2014 May 16;9(5):e97876. doi: 10.1371/journal.pone.0097876. eCollection 2014.


Unbiased high-throughput sequencing of whole metagenome shotgun DNA libraries is a promising new approach to identifying microbes in clinical specimens, which, unlike other techniques, is not limited to known sequences. Unlike most sequencing applications, it is highly sensitive to laboratory contaminants as these will appear to originate from the clinical specimens. To assess the extent and diversity of sequence contaminants, we aligned 57 "1000 Genomes Project" sequencing runs from six centers against the four largest NCBI BLAST databases, detecting reads of diverse contaminant species in all runs and identifying the most common of these contaminant genera (Bradyrhizobium) in assembled genomes from the NCBI Genome database. Many of these microorganisms have been reported as contaminants of ultrapure water systems. Studies aiming to identify novel microbes in clinical specimens will greatly benefit from not only preventive measures such as extensive UV irradiation of water and cross-validation using independent techniques, but also a concerted effort to sequence the complete genomes of common contaminants so that they may be subtracted computationally.

Publication types

  • Research Support, N.I.H., Extramural
  • Research Support, Non-U.S. Gov't

MeSH terms

  • Base Sequence
  • Bradyrhizobium / genetics
  • Bradyrhizobium / isolation & purification
  • DNA Contamination*
  • DNA, Bacterial / chemistry*
  • DNA, Bacterial / isolation & purification
  • High-Throughput Nucleotide Sequencing / methods*
  • High-Throughput Nucleotide Sequencing / standards
  • Microbiota
  • Molecular Sequence Data
  • Sequence Alignment
  • Sequence Analysis, DNA / methods*
  • Sequence Analysis, DNA / standards


  • DNA, Bacterial