Skip to main page content
Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
. 2015 Aug;21(8):938-945.
doi: 10.1038/nm.3909. Epub 2015 Jul 20.

The Prognostic Landscape of Genes and Infiltrating Immune Cells Across Human Cancers

Free PMC article

The Prognostic Landscape of Genes and Infiltrating Immune Cells Across Human Cancers

Andrew J Gentles et al. Nat Med. .
Free PMC article


Molecular profiles of tumors and tumor-associated cells hold great promise as biomarkers of clinical outcomes. However, existing data sets are fragmented and difficult to analyze systematically. Here we present a pan-cancer resource and meta-analysis of expression signatures from ∼18,000 human tumors with overall survival outcomes across 39 malignancies. By using this resource, we identified a forkhead box MI (FOXM1) regulatory network as a major predictor of adverse outcomes, and we found that expression of favorably prognostic genes, including KLRB1 (encoding CD161), largely reflect tumor-associated leukocytes. By applying CIBERSORT, a computational approach for inferring leukocyte representation in bulk tumor transcriptomes, we identified complex associations between 22 distinct leukocyte subsets and cancer survival. For example, tumor-associated neutrophil and plasma cell signatures emerged as significant but opposite predictors of survival for diverse solid tumors, including breast and lung adenocarcinomas. This resource and associated analytical tools ( may help delineate prognostic genes and leukocyte subsets within and across cancers, shed light on the impact of tumor heterogeneity on cancer outcomes, and facilitate the discovery of biomarkers and therapeutic targets.


Figure 1
Figure 1. Prognostic landscape of gene expression across human cancers
(a) Schematic depicting PRECOG data pre-processing and analysis steps. (b) Number of patient samples with survival data included in PRECOG, organized by cancer type. Thirty-nine distinct histologies (e.g. adenocarcinoma and squamous cell carcinoma in lung cancer, different types of blood cancer) have been grouped into 18 clusters for concise display. (c) Left: Approximately 2/3 of prognostic genes (filtered for |meta-z| > 3.09, or nominal one-sided P < 0.001) are prognostic in more than one of the 39 distinct cancer histologies for which meta-z scores were computed, while the remaining 1/3 are prognostic in only a single histology; the latter are cancer-specific. Right: Same analysis shown in the left panel but applied to randomly shuffled gene labels for each cancer in PRECOG. Based on 100,000 trials, the empirical P value for the observed enrichment of shared genes is P < 10−5. (d) Left: Heat map showing genes (rows) clustered by association between expression levels and survival outcomes across 166 individual cancer studies (columns). Z-scores represent the statistical significance of each gene's association with survival, with poor prognosis genes colored red, and favorable prognosis genes colored green. All identified clusters were ranked by compound scores that integrate cluster size with the prognostic significance of genes within each cluster; the top five ranking clusters are shown (left; Methods). Right: Representative functional enrichments for each of the five clusters, determined by analyzing annotated gene sets with a Bonferroni-corrected hypergeometric test. All clusters, including associated datasets and compound scores, are provided in Supplementary Table 3.
Figure 2
Figure 2. Genes globally associated with adverse and favorable survival
(a) Analysis of the number of cancer types used to identify pan-cancer prognostic genes versus the significance of these genes in validation datasets. Left: The top ten adverse and favorable pan-cancer prognostic genes were identified in training sets (comprised of t cancer types) and assessed by mean meta-z scores in validation sets (remaining 39 – t cancers) (Methods). For each value of t, from 1 to 31, histologies were randomly drawn from PRECOG 100 times, and the results are presented as means ±95% CI. Right: The 10 most frequent cancer-wide adverse and favorable prognostic genes are shown for t = 31 (above this threshold, performance gains were marginal). Of note, global meta-z scores (bottom x-axis) reflect all cancers in PRECOG (Supplementary Table 1). (b) Comparison of global meta-z scores between PRECOG (n = 17,808 tumors) and TCGA RNA-seq data (n = 6,663 tumors), with FOXM1 and KLRB1 indicated. Points lying between parallel gray lines represent insignificant genes in PRECOG, TCGA, or both (nominal two-sided P > 0.05). (c) Kaplan Meier curves showing differences in overall survival for patients in validation sets stratified by a FOXM1 and KLRB1 expression score (Methods). For each cancer, a median split was used and curve separation was assessed by a log-rank test. Survival units from different studies were standardized to months. Lung cancers were primarily stage I (~2/3), and the melanoma data consisted primarily of metastatic samples (Methods). 95% confidence intervals are presented in brackets. HR, hazard ratio. (d) Top: Genes ranked by mean meta-z scores across all datasets in PRECOG (n = 23,288 genes). Center: Protein-protein association (PPA) networks for the top 100 genes determined by mean pan-cancer meta-z scores. Edges are colored to denote experimentally confirmed interactions and/or associations in curated databases (blue edges), and other sources of evidence (gray edges) (Methods and Supplementary Table 5). Functional annotation P values were determined using a Benjamini-Hochberg-corrected hypergeometric test. Genes in the pan-cancer prognostic networks are colored according to the number of cancer-specific PPA networks in which they are also found. 0* indicates genes only found in PPA networks derived from all cancers. Bottom: Two metrics of network connectivity are compared among PPA networks for the top 100 prognostic genes derived from all cancers (red diamonds) versus individual cancers and studies in PRECOG (gray circles): x-axis = node degree, the average number of edges e (i.e., PPAs) per node n (i.e., protein); y-axis = algebraic connectivity, a graph theoretic measure of overall network connectedness (Methods).
Figure 3
Figure 3. Inferred leukocyte frequencies and prognostic associations in 25 human cancers
(a) Relative leukocyte fractions enumerated in solid tumors by CIBERSORT versus immunohistochemistry (IHC) or flow cytometry (FACS) on independent samples. CRC, colorectal cancer; LUAD, lung adenocarcinoma. To approximate ground truth proportions in CRC biopsies, levels were inferred by averaging previously reported leukocyte counts from the tumor center and invasive margin of 107 patients. Baseline leukocyte fractions in LUAD biopsies were enumerated by FACS (n = 13 tumors; data represented as medians; details in Methods). CIBERSORT results are represented as mean leukocyte fractions for the corresponding histologies (Supplementary Table 6). (b) Estimated mRNA fractions of 22 leukocyte subsets across 25 cancers (Affymetrix platforms only; see Methods), pooled into 11 immune populations here for clarity (for full details, see Supplementary Table 6). (c) Global prognostic associations for 22 leukocyte types across 25 cancers (n = 5,782 tumors; left) and 14 solid non-brain tumors (n = 3,238 tumors; right), ranked by unweighted meta-z score, with a false discovery rate (FDR) threshold of 25% indicated for each plot. Additional FDR thresholds are provided in the supplement (Supplementary Fig. 6d). For individual cancers, see Supplementary Fig. 6a. (d) Concordance and differences in TAL prognostic associations between breast cancers and lung adenocarcinoma (for FDRs, see Supplementary Fig. 6c). Resting and activated subsets in c,d are indicated by − and +, respectively. All leukocyte subset abbreviations are defined in Supplementary Table 6. Red and blue bars in c,d indicate adverse and favorable prognostic associations, respectively.
Figure 4
Figure 4. Ratio of infiltrating PMNs to plasma cells is prognostic in diverse solid tumors
(a) Prognostic associations between inferred PMN and plasma cell (PC) frequencies are significantly inversely correlated across the cancer landscape (Pearson R = −0.46, P = 0.02). Each point represents an individual cancer: triangles, blood cancers; squares, brain cancers; circles, remaining cancers. (b) Meta-z scores depict the prognostic significance of combining PMN and PC levels into a ratiometric index, for diverse solid tumors (source data are provided in Supplementary Table 6). (c) Comparison between CIBERSORT and tissue microarray analysis for PC, B-cell, and PMN frequencies in lung adenocarcinoma, using IGKC, CD20, and MPO, respectively, as surrogate markers for TMA (n = 187 specimens). Lung adenocarcinoma arrays from publicly available datasets (GSE7670 and GSE10072) were analyzed with CIBERSORT (n = 85 tumors). (d,e) Kaplan-Meier Plots depict patients stratified by (d) the median level of PMN to PC fractions inferred in lung adenocarcinoma microarray studies (P = 0.0005, log-rank test; n = 453 high and 453 low patients; Supplementary Table 6) and (e) the median level of MPO/IGKC stained positive in lung adenocarcinoma tissue sections (P = 0.028, log-rank test; n = 94 high and 93 low patients). Hazard ratios were 1.5 (1.2–1.9, 95% CI) for d and 1.7 (1.1–2.6, 95% CI) for e. Inferred PMN to PC levels were also significantly prognostic in continuous models assessed by univariate Cox regression in d (P = 0.003, Z = 2.98) and e (P = 0.0005, Z = 3.46). Data in c are presented as means ± s.e.m. All patients were right censored after 5 years in d and e.

Similar articles

See all similar articles

Cited by 518 articles

See all "Cited by" articles

Publication types