Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
. 2014 Aug 15;30(16):2310-6.
doi: 10.1093/bioinformatics/btu239. Epub 2014 Apr 23.

On non-detects in qPCR data

Affiliations
Free PMC article

On non-detects in qPCR data

Matthew N McCall et al. Bioinformatics. .
Free PMC article

Abstract

Motivation: Quantitative real-time PCR (qPCR) is one of the most widely used methods to measure gene expression. Despite extensive research in qPCR laboratory protocols, normalization and statistical analysis, little attention has been given to qPCR non-detects-those reactions failing to produce a minimum amount of signal.

Results: We show that the common methods of handling qPCR non-detects lead to biased inference. Furthermore, we show that non-detects do not represent data missing completely at random and likely represent missing data occurring not at random. We propose a model of the missing data mechanism and develop a method to directly model non-detects as missing data. Finally, we show that our approach results in a sizeable reduction in bias when estimating both absolute and differential gene expression.

Availability and implementation: The proposed algorithm is implemented in the R package, nondetects. This package also contains the raw data for the three example datasets used in this manuscript. The package is freely available at http://mnmccall.com/software and as part of the Bioconductor project.

PubMed Disclaimer

Figures

Fig. 1.
Fig. 1.
Within replicate residuals stratified by the presence of non-detects. The average ΔΔCt (A) or ΔCt (B and C) values were calculated within each set of replicates (same gene and sample type). The residuals, for each gene and sample from this summarization are plotted here, stratified by the presence of non-detects. In dataset 1, a non-detect could occur in the perturbation sample, the control sample or both samples. The left-most box in Panel A shows the distribution of residuals in dataset 1 when there are no non-detects. The other boxes in Panel A (from left to right) show the distribution of residuals when there are non-detects in the perturbation sample, the control sample and both samples. Similarly, the left box in Panels B and C shows the distribution of residuals when there are no non-detects. The right box in Panels B and C shows the distribution of residuals when there is a non-detect. Although one would expect some difference in the distribution of residuals between the detects and non-detects, the differences seen here are much larger than one would expect and likely represent bias introduced by setting non-detects equal to 40
Fig. 2.
Fig. 2.
Examples of the potential for spurious differential expression produced by replacing non-detects with values of 40. Panel (A) shows the response of Sema7a to the perturbation of nine genes from dataset 1. Panel (B) shows the expression of Gpr149 in each combination of normal/tumor samples and one of three treatments from dataset 2. Panel (C) shows the response of Pdlim2 to p53 and/or Ras mutation from dataset 3. ΔCt and ΔΔCt values produced by replacing a non-detect with a value of 40 are shown as asterisks. Note that in panel A, a non-detect could have also occurred in one of the control samples; however, in these data this did not occur for Sema7a—all of the non-detects happened to occur in the perturbed samples
Fig. 3.
Fig. 3.
The proportion of non-detects versus median observed gene expression within control samples (A) or within each sample condition (B and C). Logistic regression fits (dashed lines) all show a strong relationship between the proportion of non-detects and the median observed gene expression—P-values of (A) 2.57×106, (B) 1.58×1012, (C) <2×1016
Fig. 4.
Fig. 4.
The proportion of non-detects versus median sample expression within controls in dataset 1. Logistic regression fit (dashed line) shows a strong relationship between the proportion of non-detects and the median gene expression—P-value of 0.0003
Fig. 5.
Fig. 5.
The distribution of Ct values in each of the three datasets. Here, non-detects are coded as 40
Fig. 6.
Fig. 6.
Same as Figure 1, with additional boxplots showing the residuals when non-detects are replaced with 35 rather than 40. Here, Ct values >35 are also replaced by a value of 35. By replacing non-detects with a value of 35 rather than 40, the distribution of the residuals is far more similar between those in which the Ct values were observed and those containing a non-detect. However, this does not imply that one should replace non-detects with a value of 35. Such an approach makes very strong assumptions about the missing data mechanism and would require one to discard observed Ct values >35
Fig. 7.
Fig. 7.
Same as Figure 1, but after imputing the non-detects using the proposed EM algorithm
Fig. 8.
Fig. 8.
Same as Figure 2, but after EM imputation of non-detects

Similar articles

Cited by

References

    1. Almudevar A, et al. Fitting Boolean networks from steady state perturbation data. Stat. Appl. Genet. Mol. Biol. 2011;10:47. - PMC - PubMed
    1. Bustin S. Quantification of mRNA using real-time reverse transcription PCR (RT-PCR): trends and problems. J. Mol. Endocrinol. 2002;29:23–39. - PubMed
    1. Bustin SA. Absolute quantification of mRNA using real-time reverse transcription polymerase chain reaction assays. J. Mol. Endocrinol. 2000;25:169–193. - PubMed
    1. Bustin SA, Nolan T. Pitfalls of quantitative real-time reverse-transcription polymerase chain reaction. J. Biomol. Tech. 2004;15:155. - PMC - PubMed
    1. Bustin SA, et al. The MIQE guidelines: minimum information for publication of quantitative real-time PCR experiments. Clin. Chem. 2009;55:611–622. - PubMed

Publication types