Microarray reality checks in the context of a complex disease

Nat Biotechnol. 2004 May;22(5):615-21. doi: 10.1038/nbt965.


A problem in analyzing microarray-based gene expression data is the separation of genes causally involved in a disease from innocent bystander genes, whose expression levels have been secondarily altered by primary changes elsewhere. To investigate this issue systematically in the context of a class of complex human diseases, we have compared microarray-based gene expression data with non-microarray-based clinical and biological data about the schizophrenias to ask whether these two approaches prioritize the same genes. We find that genes whose expression changes are deemed to be of importance from microarrays are rarely those classified as of importance from clinical, in situ, molecular, single-nucleotide polymorphism (SNP) association, knockout and drug perturbation data. This disparity is not limited to the schizophrenias but characterizes other human disease data sets. It also extends to biological validation of microarray data in model organisms, in which genome-wide phenotypic data have been systematically compared with microarray data. In addition, different bioinformatic protocols applied to the same microarray data yield quite different gene sets and thus make clinical decisions less straightforward. We discuss how progress may be improved in the clinical area by the assignment of high-quality phenotypic values to each member of a microarray-assigned gene set.

MeSH terms

  • Computational Biology
  • Genetic Heterogeneity
  • Genetic Predisposition to Disease*
  • Humans
  • Oligonucleotide Array Sequence Analysis*
  • Schizophrenia / genetics