Skip to main page content
Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
. 2012;7(12):e50938.
doi: 10.1371/journal.pone.0050938. Epub 2012 Dec 7.

Analyzing Illumina Gene Expression Microarray Data From Different Tissues: Methodological Aspects of Data Analysis in the Metaxpress Consortium

Free PMC article

Analyzing Illumina Gene Expression Microarray Data From Different Tissues: Methodological Aspects of Data Analysis in the Metaxpress Consortium

Claudia Schurmann et al. PLoS One. .
Free PMC article


Microarray profiling of gene expression is widely applied in molecular biology and functional genomics. Experimental and technical variations make meta-analysis of different studies challenging. In a total of 3358 samples, all from German population-based cohorts, we investigated the effect of data preprocessing and the variability due to sample processing in whole blood cell and blood monocyte gene expression data, measured on the Illumina HumanHT-12 v3 BeadChip array.Gene expression signal intensities were similar after applying the log(2) or the variance-stabilizing transformation. In all cohorts, the first principal component (PC) explained more than 95% of the total variation. Technical factors substantially influenced signal intensity values, especially the Illumina chip assignment (33-48% of the variance), the RNA amplification batch (12-24%), the RNA isolation batch (16%), and the sample storage time, in particular the time between blood donation and RNA isolation for the whole blood cell samples (2-3%), and the time between RNA isolation and amplification for the monocyte samples (2%). White blood cell composition parameters were the strongest biological factors influencing the expression signal intensities in the whole blood cell samples (3%), followed by sex (1-2%) in both sample types. Known single nucleotide polymorphisms (SNPs) were located in 38% of the analyzed probe sequences and 4% of them included common SNPs (minor allele frequency >5%). Out of the tested SNPs, 1.4% significantly modified the probe-specific expression signals (Bonferroni corrected p-value<0.05), but in almost half of these events the signal intensities were even increased despite the occurrence of the mismatch. Thus, the vast majority of SNPs within probes had no significant effect on hybridization efficiency.In summary, adjustment for a few selected technical factors greatly improved reliability of gene expression analyses. Such adjustments are particularly required for meta-analyses.

Conflict of interest statement

Competing Interests: The authors have read the journal's policy and have the following interest: They received funding from a commercial source (Siemens Healthcare, Erlangen, Germany, InterSystems GmbH, Boehringer Ingelheim, PHILIPS Medical Systems). There are no patents, products in development or marketed products to declare. This does not alter the authors' adherence to all the PLOS ONE policies on sharing data and materials. Co-authors Tanja Zeller and Christian Herder are PLOS ONE Editorial Board members. This does not alter their adherence to all the PLOS ONE policies on sharing data and materials.


Figure 1
Figure 1. Log2 transformation (L2T) versus variance-stabilizing transformation (VST).
The panels show the association results for the random phenotype (A–C) and for body mass index (BMI) (D–F) on each mRNA probe adjusted for sex, age, RNA amplification batch, RNA integrity number (RIN) and the sample storage time based on L2T expression values (x-axis) and on VST values (y-axis) in the SHIP-TREND cohort. The upper panels (A, D) show the betas, the middle panels (B, E) show the standard errors (SEs) and the lower panels (C, F) show the negative log10 association p-values. The corresponding squared Pearson product-moment correlation coefficient between the plotted values is given in the upper right corner of each plot. Each spot represents a probe and is colored according to its mean L2T expression value from all samples. The color code is given in the legend located in the lower right corner of each plot. Although betas and SEs differ between both transformations, the association p-values are highly correlated.
Figure 2
Figure 2. Unexplained variance after adjustment for principle components (PCs).
The panels show the percentage of adjusted unexplained variance (y-axis) of the regression on the log2 transformed (L2T) gene expression levels and body mass index (BMI) (A) or the random phenotype (B) over the first 100 PCs (x-axis). With both phenotypes the unexplained variance decreases continuously with the addition of further PCs to the regression model. Results are given separately for the SHIP-TREND, KORA F4 and GHS cohorts.
Figure 3
Figure 3. Effects of SNPs within probes on signal intensities.
The effects on measured log2 transformed (L2T) gene expression levels per mismatch allele of SNPs located within probes (y-axis) are plotted against the mean L2T expression level of the samples for each probe (x-axis). Each spot represents a SNP-probe combination; associations with significant p-values after Bonferroni correction (p<2.3×10−5) are colored in red and p-values below 0.05 are colored in orange. To increase legibility the y-axis was limited from −3 to 3 excluding 176 non-significant results out of 1237 successful association results (minimum and maximum effect sizes were −174.1 and 188.7, respectively). Surprisingly, in almost 45% of the associations a positive effect per mismatch allele on expression signal intensity was observed.
Figure 4
Figure 4. Workflow – from blood sampling to measured mRNA intensities.
From left to right: Whole blood was collected and stored in PAXgene tubes until isolation of RNA from whole blood cells in both SHIP-TREND and KORA F4. In GHS, monocytes were separated from whole blood and RNA was isolated from monocytes within 24 hours after blood sampling, subsequently storing the isolated RNA until amplification. The sample storage time refers to the duration the whole blood (SHIP-TREND and KORA F4) or isolated RNA (GHS) was stored before further processing, shown as mean ± standard deviation in days. The samples were processed in 96 well plates both after isolation and amplification of the RNA. The corresponding plate layouts were called RNA isolation batch and RNA amplification batch, respectively. Finally, the RNA was hybridized and the arrays were scanned, quality controlled and analyzed.

Similar articles

See all similar articles

Cited by 40 articles

See all "Cited by" articles


    1. Ramasamy A, Mondry A, Holmes CC, Altman DG (2008) Key issues in conducting a meta-analysis of gene expression microarray datasets. PLoS Med 5: e184. - PMC - PubMed
    1. Heinig M, Petretto E, Wallace C, Bottolo L, Rotival M, et al. (2010) A trans-acting locus regulates an anti-viral expression network and type 1 diabetes risk. Nature 467: 460–464. - PMC - PubMed
    1. Sotiriou C, Neo SY, McShane LM, Korn EL, Long PM, et al. (2003) Breast cancer classification and prognosis based on gene expression profiles from a population-based study. Proc Natl Acad Sci U S A 100: 10393–10398. - PMC - PubMed
    1. van de Vijver MJ, He YD, van't Veer LJ, Dai H, Hart AAM, et al. (2002) A gene-expression signature as a predictor of survival in breast cancer. N Engl J Med 347: 1999–2009. - PubMed
    1. van 't Veer LJ, Dai H, van de Vijver MJ, He YD, Hart AAM, et al. (2002) Gene expression profiling predicts clinical outcome of breast cancer. Nature 415: 530–536. - PubMed

Publication types

Grant support

SHIP is part of the Community Medicine Research net of the University of Greifswald, Germany, which is funded by the BMBF (German Ministry of Education and Research,, the Ministry of Cultural Affairs ( as well as the Social Ministry of the Federal State of Mecklenburg-West Pomerania ( Analyses were supported by the “Greifswald Approach to Individualized Medicine (GANI_MED,” consortium funded by the BMBF (grant 03IS2061A). Genome-wide genotyping and expression data have been supported by the BMBF (grant no. 03ZIK012) and a joint grant from Siemens Healthcare, Erlangen, Germany ( and the Federal State of Mecklenburg, West Pomerania ( The University of Greifswald is a member of the ‘Center of Knowledge Interchange’ program of the Siemens AG and the Caché Campus program of the InterSystems GmbH ( The KORA research platform and the KORA Augsburg studies are financed by the Helmholtz Zentrum München, German Research Center for Environmental Health (, which is funded by the BMBF and by the State of Bavaria ( The German Diabetes Center is funded by the German Federal Ministry of Health ( and the Ministry of School, Science and Research of the State of North-Rhine-Westphalia ( The Diabetes Cohort Study was funded by a German Research Foundation ( project grant to W.R. (DFG; RA 459/2-1). This study was supported in part by a grant from the BMBF to the German Center for Diabetes Research (DZD e.V., This work was supported by the BMBF funded Systems Biology of Metabotypes grant (SysMBo#0315494A). Additional support was obtained from the BMBF (National Genome Research Network NGFNplus Atherogenomics, 01GS0834) and the Leibniz Association ( (WGL Pakt für Forschung und Innovation). The Gutenberg Health Study is funded through the government of Rheinland-Pfalz ( (“Stiftung Rheinland Pfalz für Innovation”, contract AZ 961–386261/733), the research programs “Wissen schafft Zukunft” and “Schwerpunkt Vaskuläre Prävention” of the Johannes Gutenberg-University of Mainz (, and its contract with Boehringer Ingelheim ( and PHILIPS Medical Systems (, including an unrestricted grant for the Gutenberg Health Study. Specifically, the research reported in this article was supported by the National Genome Network “NGFNplus” ( (contract 01GS0833 and 01GS0831) by the BMBF, and a joint funding grant from the BMBF, and the Agence Nationale de la Recherche, France ( (contract BMBF 01KU0908A and ANR 09 GENO 106 01). This work was supported in part by the European Union ( (HEALTH-2011-278913), the BMBF (grants 01KU0908A, 01KU0908B, 0315536F), and supported by the DZHK (Deutsches Zentrum für Herz-Kreislauf-Forschung – German Centre for Cardiovascular Research,, and by the BMBF. The funders had no role in study design, data collection and analysis, decision to publish, or preparation of the manuscript.