Background: Normalization in real-time qRT-PCR is necessary to compensate for experimental variation. A popular normalization strategy employs reference gene(s), which may introduce additional variability into normalized expression levels due to innate variation (between tissues, individuals, etc). To minimize this innate variability, multiple reference genes are used. Current methods of selecting reference genes make an assumption of independence in their innate variation. This assumption is not always justified, which may lead to selecting a suboptimal set of reference genes.
Results: We propose a robust approach for selecting optimal subset(s) of reference genes with the smallest variance of the corresponding normalizing factors. The normalizing factor variance estimates are based on the estimated unstructured covariance matrix of all available candidate reference genes, adjusting for all possible correlations. Robustness is achieved through bootstrapping all candidate reference gene data and obtaining the bootstrap upper confidence limits for the variances of the log-transformed normalizing factors. The selection of the reference gene subset is optimized with respect to one of the following criteria: (A) to minimize the variability of the normalizing factor; (B) to minimize the number of reference genes with acceptable upper limit on variability of the normalizing factor, (C) to minimize the average rank of the variance of the normalizing factor. The proposed approach evaluates all gene subsets of various sizes rather than ranking individual reference genes by their stability, as in the previous work. In two publicly available data sets and one new data set, our approach identified subset(s) of reference genes with smaller empirical variance of the normalizing factor than in subsets identified using previously published methods. A small simulation study indicated an advantage of the proposed approach in terms of sensitivity to identify the true optimal reference subset in the presence of even modest, especially negative correlation among the candidate reference genes.
Conclusions: The proposed approach performs comprehensive and robust evaluation of the variability of normalizing factors based on all possible subsets of candidate reference genes. The results of this evaluation provide flexibility to choose from important criteria for selecting the optimal subset(s) of reference genes, unless one subset meets all the criteria. This approach identifies gene subset(s) with smaller variability of normalizing factors than current standard approaches, particularly if there is some nontrivial innate correlation among the candidate genes.