Impact of Tumor Purity on Immune Gene Expression and Clustering Analyses across Multiple Cancer Types

Cancer Immunol Res. 2018 Jan;6(1):87-97. doi: 10.1158/2326-6066.CIR-17-0201. Epub 2017 Nov 15.


Surgical archives of tumor specimens are often impure. The presence of RNA transcripts from nontumor cells, such as immune and stromal cells, can impede analyses of cancer expression profiles. To systematically analyze the impact of tumor purity, the gene expression profiles and tumor purities were obtained for 7,794 tumor specimens across 21 tumor types (available in The Cancer Genome Atlas consortium). First, we observed that genes with roles in immunity and oxidative phosphorylation were significantly inversely correlated and correlated with the tumor purity, respectively. The expression of genes implicated in immunotherapy and specific immune cell genes, along with the abundance of immune cell infiltrates, was substantially inversely correlated with tumor purity. This relationship may explain the correlation between immune gene expression and mutation burden, highlighting the need to account for tumor purity in the evaluation of expression markers obtained from bulk tumor transcriptome data. Second, examination of cluster membership of gene pairs, with or without controlling for tumor purity, revealed that tumor purity may have a substantial impact on gene clustering across tumor types. Third, feature genes for molecular taxonomy were analyzed for correlation with tumor purity, and for some tumor types, feature genes representing the mesenchymal and classical subtypes were inversely correlated and correlated with tumor purity, respectively. Our findings indicate that tumor purity is an important confounder in evaluating the correlation between gene expression and clinicopathologic features such as mutation burden, as well as gene clustering and molecular taxonomy. Cancer Immunol Res; 6(1); 87-97. ©2017 AACR.

Publication types

  • Research Support, Non-U.S. Gov't

MeSH terms

  • Alleles
  • Biomarkers, Tumor*
  • Cluster Analysis
  • Computational Biology / methods
  • Databases, Genetic
  • Gene Expression Profiling
  • Gene Expression Regulation, Neoplastic*
  • Genetic Variation
  • Humans
  • Immunity / genetics*
  • Kaplan-Meier Estimate
  • Mutation
  • Neoplasms / genetics*
  • Neoplasms / immunology*
  • Neoplasms / mortality
  • Neoplasms / pathology
  • Organ Specificity / genetics
  • Prognosis
  • Transcriptome


  • Biomarkers, Tumor