Purpose: More than 20 million archival tissue samples are stored annually in the United States as formalin-fixed, paraffin-embedded (FFPE) blocks, but RNA degradation during fixation and storage has prevented their use for transcriptional profiling. New and highly sensitive assays for whole-transcriptome microarray analysis of FFPE tissues are now available, but resulting data include noise and variability for which previous expression array methods are inadequate.
Experimental design: We present the two largest whole-genome expression studies from FFPE tissues to date, comprising 1,003 colorectal cancer (CRC) and 168 breast cancer samples, combined with a meta-analysis of 14 new and published FFPE microarray datasets. We develop and validate quality control (QC) methods through technical replication, independent samples, comparison to results from fresh-frozen tissue, and recovery of expected associations between gene expression and protein abundance.
Results: Archival tissues from large, multicenter studies showed a much wider range of transcriptional data quality relative to smaller or frozen tissue studies and required stringent QC for subsequent analysis. We developed novel methods for such QC of archival tissue expression profiles based on sample dynamic range and per-study median profile. This enabled validated identification of gene signatures of microsatellite instability and additional features of CRC, and improved recovery of associations between gene expression and protein abundance of MLH1, FASN, CDX2, MGMT, and SIRT1 in CRC tumors.
Conclusions: These methods for large-scale QC of FFPE expression profiles enable study of the cancer transcriptome in relation to extensive clinicopathological information, tumor molecular biomarkers, and long-term lifestyle and outcome data.