Background: Breast cancer is a heterogeneous disease comprising several biologically different types, exhibiting diverse responses to treatment. In the past years, gene expression profiling has led to definition of several "intrinsic subtypes" of breast cancer (basal-like, HER2-enriched, luminal-A, luminal-B and normal-like), and microarray based predictors such as PAM50 have been developed. Despite their advantage over traditional histopathological classification, precise identification of breast cancer subtypes, especially within the largest and highly variable luminal-A class, remains a challenge. In this study, we revisited the molecular classification of breast tumors using both expression and methylation data obtained from The Cancer Genome Atlas (TCGA).
Methods: Unsupervised clustering was applied on 1148 and 679 breast cancer samples using RNA-Seq and DNA methylation data, respectively. Clusters were evaluated using clinical information and by comparison to PAM50 subtypes. Differentially expressed genes and differentially methylated CpGs were tested for enrichment using various annotation sets. Survival analysis was conducted on the identified clusters using the log-rank test and Cox proportional hazards model.
Results: The clusters in both expression and methylation datasets had only moderate agreement with PAM50 calls, while our partitioning of the luminal samples had better five-year prognostic value than the luminal-A/luminal-B assignment as called by PAM50. Our analysis partitioned the expression profiles of the luminal-A samples into two biologically distinct subgroups exhibiting differential expression of immune-related genes, with one subgroup carrying significantly higher risk for five-year recurrence. Analysis of the luminal-A samples using methylation data identified a cluster of patients with poorer survival, characterized by distinct hyper-methylation of developmental genes. Cox multivariate survival analysis confirmed the prognostic significance of the two partitions after adjustment for commonly used factors such as age and pathological stage.
Conclusions: Modern genomic datasets reveal large heterogeneity among luminal breast tumors. Our analysis of these data provides two prognostic gene sets that dissect and explain tumor variability within the luminal-A subgroup, thus, contributing to the advancement of subtype-specific diagnosis and treatment.
Keywords: Breast cancer subtypes; Clustering; DNA methylation; Luminal-A; RNA-Seq; Unsupervised analysis.