Background: Variance in microarray studies has been widely discussed as a critical topic on the identification of differentially expressed genes; however, few studies have addressed the influence of estimating variance.
Methodology/principal findings: To break intra- and inter-individual variance in clinical studies down to three levels--technical, anatomic, and individual--we designed experiments and algorithms to investigate three forms of variances. As a case study, a group of "inter-individual variable genes" were identified to exemplify the influence of underestimated variance on the statistical and biological aspects in identification of differentially expressed genes. Our results showed that inadequate estimation of variance inevitably led to the inclusion of non-statistically significant genes into those listed as significant, thereby interfering with the correct prediction of biological functions. Applying a higher cutoff value of fold changes in the selection of significant genes reduces/eliminates the effects of underestimated variance.
Conclusions/significance: Our data demonstrated that correct variance evaluation is critical in selecting significant genes. If the degree of variance is underestimated, "noisy" genes are falsely identified as differentially expressed genes. These genes are the noise associated with biological interpretation, reducing the biological significance of the gene set. Our results also indicate that applying a higher number of fold change as the selection criteria reduces/eliminates the differences between distinct estimations of variance.