A three-gene model to robustly identify breast cancer molecular subtypes

J Natl Cancer Inst. 2012 Feb 22;104(4):311-25. doi: 10.1093/jnci/djr545. Epub 2012 Jan 18.

Abstract

Background: Single sample predictors (SSPs) and Subtype classification models (SCMs) are gene expression-based classifiers used to identify the four primary molecular subtypes of breast cancer (basal-like, HER2-enriched, luminal A, and luminal B). SSPs use hierarchical clustering, followed by nearest centroid classification, based on large sets of tumor-intrinsic genes. SCMs use a mixture of Gaussian distributions based on sets of genes with expression specifically correlated with three key breast cancer genes (estrogen receptor [ER], HER2, and aurora kinase A [AURKA]). The aim of this study was to compare the robustness, classification concordance, and prognostic value of these classifiers with those of a simplified three-gene SCM in a large compendium of microarray datasets.

Methods: Thirty-six publicly available breast cancer datasets (n = 5715) were subjected to molecular subtyping using five published classifiers (three SSPs and two SCMs) and SCMGENE, the new three-gene (ER, HER2, and AURKA) SCM. We used the prediction strength statistic to estimate robustness of the classification models, defined as the capacity of a classifier to assign the same tumors to the same subtypes independently of the dataset used to fit it. We used Cohen κ and Cramer V coefficients to assess concordance between the subtype classifiers and association with clinical variables, respectively. We used Kaplan-Meier survival curves and cross-validated partial likelihood to compare prognostic value of the resulting classifications. All statistical tests were two-sided.

Results: SCMs were statistically significantly more robust than SSPs, with SCMGENE being the most robust because of its simplicity. SCMGENE was statistically significantly concordant with published SCMs (κ = 0.65-0.70) and SSPs (κ = 0.34-0.59), statistically significantly associated with ER (V = 0.64), HER2 (V = 0.52) status, and histological grade (V = 0.55), and yielded similar strong prognostic value.

Conclusion: Our results suggest that adequate classification of the major and clinically relevant molecular subtypes of breast cancer can be robustly achieved with quantitative measurements of three key genes.

Publication types

  • Comparative Study
  • Research Support, Non-U.S. Gov't

MeSH terms

  • Aurora Kinase A
  • Aurora Kinases
  • Breast Neoplasms / classification*
  • Breast Neoplasms / diagnosis*
  • Breast Neoplasms / genetics
  • Breast Neoplasms / metabolism
  • Breast Neoplasms / mortality
  • Breast Neoplasms / pathology
  • Cluster Analysis
  • Confounding Factors, Epidemiologic
  • Databases, Genetic
  • Female
  • Gene Expression Profiling
  • Gene Expression Regulation, Neoplastic
  • Humans
  • Kaplan-Meier Estimate
  • Microarray Analysis*
  • Polymorphism, Single-Stranded Conformational
  • Predictive Value of Tests
  • Prognosis
  • Protein-Serine-Threonine Kinases / genetics*
  • Protein-Serine-Threonine Kinases / metabolism
  • Receptor, ErbB-2 / genetics*
  • Receptor, ErbB-2 / metabolism
  • Receptors, Estrogen / genetics*
  • Receptors, Estrogen / metabolism
  • Research Design
  • Retrospective Studies

Substances

  • Receptors, Estrogen
  • Receptor, ErbB-2
  • AURKA protein, human
  • Aurora Kinase A
  • Aurora Kinases
  • Protein-Serine-Threonine Kinases