Prognostic meta-signature of breast cancer developed by two-stage mixture modeling of microarray data

BMC Genomics. 2004 Dec 14;5(1):94. doi: 10.1186/1471-2164-5-94.

Abstract

Background: An increasing number of studies have profiled tumor specimens using distinct microarray platforms and analysis techniques. With the accumulating amount of microarray data, one of the most intriguing yet challenging tasks is to develop robust statistical models to integrate the findings.

Results: By applying a two-stage Bayesian mixture modeling strategy, we were able to assimilate and analyze four independent microarray studies to derive an inter-study validated "meta-signature" associated with breast cancer prognosis. Combining multiple studies (n = 305 samples) on a common probability scale, we developed a 90-gene meta-signature, which strongly associated with survival in breast cancer patients. Given the set of independent studies using different microarray platforms which included spotted cDNAs, Affymetrix GeneChip, and inkjet oligonucleotides, the individually identified classifiers yielded gene sets predictive of survival in each study cohort. The study-specific gene signatures, however, had minimal overlap with each other, and performed poorly in pairwise cross-validation. The meta-signature, on the other hand, accommodated such heterogeneity and achieved comparable or better prognostic performance when compared with the individual signatures. Further by comparing to a global standardization method, the mixture model based data transformation demonstrated superior properties for data integration and provided solid basis for building classifiers at the second stage. Functional annotation revealed that genes involved in cell cycle and signal transduction activities were over-represented in the meta-signature.

Conclusion: The mixture modeling approach unifies disparate gene expression data on a common probability scale allowing for robust, inter-study validated prognostic signatures to be obtained. With the emerging utility of microarrays for cancer prognosis, it will be important to establish paradigms to meta-analyze disparate gene expression data for prognostic signatures of potential clinical use.

Publication types

  • Research Support, N.I.H., Extramural
  • Research Support, Non-U.S. Gov't
  • Research Support, U.S. Gov't, Non-P.H.S.
  • Research Support, U.S. Gov't, P.H.S.

MeSH terms

  • Bayes Theorem
  • Breast Neoplasms / diagnosis
  • Breast Neoplasms / pathology*
  • Cell Cycle
  • Cluster Analysis
  • DNA, Complementary / metabolism
  • Databases, Genetic
  • Disease-Free Survival
  • Gene Expression
  • Gene Expression Profiling*
  • Gene Expression Regulation, Neoplastic*
  • Humans
  • Oligonucleotide Array Sequence Analysis / methods*
  • Oligonucleotides / chemistry
  • Pattern Recognition, Automated
  • Prognosis
  • Recurrence
  • Signal Transduction

Substances

  • DNA, Complementary
  • Oligonucleotides