Overcoming confounded controls in the analysis of gene expression data from microarray experiments

Appl Bioinformatics. 2003;2(4):197-208.


A potential limitation of data from microarray experiments exists when improper control samples are used. In cancer research, comparisons of tumour expression profiles to those from normal samples is challenging due to tissue heterogeneity (mixed cell populations). A specific example exists in a published colon cancer dataset, in which tissue heterogeneity was reported among the normal samples. In this paper, we show how to overcome or avoid the problem of using normal samples that do not derive from the same tissue of origin as the tumour. We advocate an exploratory unsupervised bootstrap analysis that can reveal unexpected and undesired, but strongly supported, clusters of samples that reflect tissue differences instead of tumour versus normal differences. All of the algorithms used in the analysis, including the maximum difference subset algorithm, unsupervised bootstrap analysis, pooled variance t-test for finding differentially expressed genes and the jackknife to reduce false positives, are incorporated into our online Gene Expression Data Analyzer ( http:// bioinformatics.upmc.edu/GE2/GEDA.html ).

Publication types

  • Comparative Study
  • Evaluation Study
  • Research Support, Non-U.S. Gov't
  • Validation Study

MeSH terms

  • Algorithms*
  • Artifacts*
  • Bias
  • Biomarkers, Tumor / classification
  • Biomarkers, Tumor / genetics*
  • Cluster Analysis
  • Colonic Neoplasms / classification
  • Colonic Neoplasms / epidemiology
  • Colonic Neoplasms / genetics*
  • Confounding Factors, Epidemiologic
  • Gene Expression Profiling / methods*
  • Gene Expression Profiling / standards
  • Humans
  • Oligonucleotide Array Sequence Analysis / methods*
  • Oligonucleotide Array Sequence Analysis / standards
  • Pattern Recognition, Automated
  • Quality Control
  • Reproducibility of Results
  • Sample Size
  • Sensitivity and Specificity
  • Sequence Alignment / methods*
  • Sequence Analysis, DNA / methods*
  • Sequence Analysis, DNA / standards
  • Software


  • Biomarkers, Tumor