ContEst: estimating cross-contamination of human samples in next-generation sequencing data

Bioinformatics. 2011 Sep 15;27(18):2601-2. doi: 10.1093/bioinformatics/btr446. Epub 2011 Jul 29.


Summary: Here, we present ContEst, a tool for estimating the level of cross-individual contamination in next-generation sequencing data. We demonstrate the accuracy of ContEst across a range of contamination levels, sources and read depths using sequencing data mixed in silico at known concentrations. We applied our tool to published cancer sequencing datasets and report their estimated contamination levels.

Availability and implementation: ContEst is a GATK module, and distributed under a BSD style license at


Supplementary information: Supplementary data is available at Bioinformatics online.

Publication types

  • Research Support, N.I.H., Extramural

MeSH terms

  • Base Sequence
  • Bayes Theorem
  • False Positive Reactions
  • Genotype
  • Humans
  • Models, Genetic
  • Neoplasms / genetics*
  • Sequence Analysis, DNA / methods*
  • Software