Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
. 2021 Nov 5;22(6):bbab337.
doi: 10.1093/bib/bbab337.

Comparison of approaches to transcriptomic analysis in multi-sampled tumors

Affiliations

Comparison of approaches to transcriptomic analysis in multi-sampled tumors

Anson T Ku et al. Brief Bioinform. .

Abstract

Intratumoral heterogeneity is a well-documented feature of human cancers and is associated with outcome and treatment resistance. However, a heterogeneous tumor transcriptome contributes an unknown level of variability to analyses of differentially expressed genes (DEGs) that may contribute to phenotypes of interest, including treatment response. Although current clinical practice and the vast majority of research studies use a single sample from each patient, decreasing costs of sequencing technologies and computing power have made repeated-measures analyses increasingly economical. Repeatedly sampling the same tumor increases the statistical power of DEG analysis, which is indispensable toward downstream analysis and also increases one's understanding of within-tumor variance, which may affect conclusions. Here, we compared five different methods for analyzing gene expression profiles derived from repeated sampling of human prostate tumors in two separate cohorts of patients. We also benchmarked the sensitivity of generalized linear models to linear mixed models for identifying DEGs contributing to relevant prostate cancer pathways based on a ground-truth model.

Keywords: RNA-seq; linear mixed model; multiple sampling; prostate cancer; transcriptomics; variance.

PubMed Disclaimer

Figures

Figure 1
Figure 1
Schematic illustration of the work flow for cohort 1. Biopsies were obtained from one or more lesions per patient. For each lesion, one or more foci were laser capture microdissected (LCM) from biopsied materials. RNA sequencing was performed on each focus and the associated ERG and PTEN protein status were determined by immunohistochemistry of adjacent slides, while PTEN and chromosome 18q status were obtained from exome sequencing data. The gene expression data were analyzed by five different methods: counts averaging, RNA weighting, FASTQ concatenation, counts summation and LMM.
Figure 2
Figure 2
Summary of gene expression profile of cohort 1 and 2. (A) Unsupervised clustering of focus level gene expression data from cohort 1 with TMPRSS2-ERG, PTEN and chr18q loss annotation. The results revealed a strong tendency for focus to group based on TMPRSS2-ERG and PTEN status (Fisher’s exact test, adjusted P = 4.29 × 10−24 and 5.39 × 10−11, respectively) but not chr18q loss (P = 1). Corresponding patient identifier and lesion identifier were mapped below the dendrogram. Black lines corresponding to foci or lesions without TMPRSS2-ERG fusion were drawn connecting to lesions and patients, while red lines correspond to foci or lesions with TMPRSS2-ERG fusion foci. (B) Assessment of TMPRSS2-ERG, PTEN and chr18q inter and intra lesion heterogeneity. PTEN loss remains the most variable feature within and between lesions in cohort 1. (C) Unsupervised clustering of focus level gene expression data from cohort 2 with TMPRSS2-ERG, PTEN and chr18q status annotation. The results revealed a strong tendency for foci to group based on TMPRSS2-ERG and PTEN status (Fisher’s exact test, adjusted P = 2.10−5 and 0.041, respectively) but not chr18q loss (P = 1). (D) Assessment of TMPRSS2-ERG, PTEN and chr18q intra lesion heterogeneity. PTEN loss remains the most variable feature within lesions in cohort 2. Clustering was performed on the basis of total gene expression.
Figure 3
Figure 3
Set analysis of genes related to TMPRSS2-ERG fusion, PTEN loss, TMPRSS2-ERG fusion/PTEN intact, TMPRSS2-ERG fusion/PTEN loss and chr18q loss in cohorts 1 and 2. (A) Venn diagram depicting the shared DEGs between each analysis method in cohort 1. The number of DEGs common between all five methods are 467, 7, 24, 616 and 0 for TMPRSS2-ERG fusion, PTEN loss, TMPRSS2-ERG fusion/PTEN intact, TMPRSS2-ERG fusion/PTEN loss, and chr18q loss, respectively, in cohort 1. (B) Venn diagram depicting the shared DEGs between each analysis method in cohort 2. The number of DEGs common between the four methods are 331, 182, 316, 482 and 0 for TMPRSS2-ERG fusion, PTEN loss, TMPRSS2-ERG fusion/PTEN intact, TMPRSS2-ERG fusion/PTEN loss and chr18q loss, respectively, in cohort 2.
Figure 4
Figure 4
Sensitivity analysis of DEGs to alpha. For each genomic alteration analysis, the total number of unique DEGs from each method were pooled. The filtering threshold were set to |log2 fold-change |>1 followed by increasing alpha to determine the percentage of significant DEGs. The increase in the alpha from 0.05 to 0.20 did not lead to a convergence >75% of unique DEGs across genotype in cohort 1. Sensitivity curves were grouped the closest in the three TMPRSS2-ERG fusion genotypes. Averaged, concatenation and summation are consistently grouped together with increasing alpha from 0.05 to 0.20 in cohort 2. LMM approached convergence with the other methods in cohort 1 only in two genotypes (TMPRSS2-ERG fusion and TMPRSS2-ERG fusion/PTEN loss).
Figure 5
Figure 5
The relatedness of the five methods were assessed by hierarchical clustering of log2 fold-change from the five methods in cohort 1 (A) and four methods in cohort 2 (B). The type of genomic comparator is indicated above each tree.
Figure 6
Figure 6
Decomposition of gene expression variation into patient, genotypic feature and residuals from cohort 1 (left) and cohort 2 (right). Patient, genomic features and residuals explain a median of 35%, 0% and 62% for TMPRSS2-ERG fusion; 37%, 0% and 62% for PTEN loss; 31%, 0% and 64% for TMPRSS2-ERG fusion/PTEN intact; 33%, 0% and 62% for TMPRSS2-ERG fusion/PTEN loss; and 38%, 0% and 62% for 18q loss, respectively. For cohort 2, patient, genomic features, purity and residuals can explain a median of 35%, 7%, 0% and 48% for TMPRSS2-ERG, 37%, 7%, 0% and 48% for PTEN loss; 30%, 9%, 0% and 49% for TMPRSS2-ERG fusion/PTEN intact; 36%, 7%, 0% and 44% for TMPRSS2-ERG fusion/PTEN loss; and 38%, 7%, 0% and 47% for 18q loss.
Figure 7
Figure 7
Bubble plot illustrating the NES, colored from low (blue) to high (red) and −log10 adjusted P-value (size of the dot) derived from GSEA of TMPRSS2-ERG fusion, PTEN loss, TMPRSS2-ERG fusion/PTEN intact, TMPRSS2-ERG fusion/PTEN loss, and chr18q loss using differential expression analysis outputted from counts averaging, RNA weighting, concatenation, summation and LMM.

Similar articles

Cited by

References

    1. Hoadley KA, Yau C, Hinoue T, et al. . Cell-of-origin patterns dominate the molecular classification of 10,000 Tumors from 33 types of cancer. Cell 2018;173(2):291–304.e6. - PMC - PubMed
    1. ICGC/TCGA Pan-Cancer Analysis of Whole Genomes Consortium . Pan-cancer analysis of whole genomes. Nature 2020;578(7793):82–93. - PMC - PubMed
    1. PCAWG Evolution & Heterogeneity Working Group, PCAWG Evolution & Heterogeneity Working Group, PCAWG Consortium, et al. . The evolutionary history of 2,658 cancers. Nature 2020;578(7793):122–8. - PMC - PubMed
    1. Dagogo-Jack I, Shaw AT. Tumour heterogeneity and resistance to cancer therapies. Nat Rev Clin Oncol 2018;15(2):81–94. - PubMed
    1. Salami SS, Hovelson DH, Kaplan JB, et al. . Transcriptomic heterogeneity in multifocal prostate cancer. JCI Insight 2018;3(21): e123468 1–13. - PMC - PubMed

Publication types

Substances