Identifying common prognostic factors in genomic cancer studies: a novel index for censored outcomes

BMC Bioinformatics. 2010 Mar 24;11:150. doi: 10.1186/1471-2105-11-150.


Background: With the growing number of public repositories for high-throughput genomic data, it is of great interest to combine the results produced by independent research groups. Such a combination allows the identification of common genomic factors across multiple cancer types and provides new insights into the disease process. In the framework of the proportional hazards model, classical procedures, which consist of ranking genes according to the estimated hazard ratio or the p-value obtained from a test statistic of no association between survival and gene expression level, are not suitable for gene selection across multiple genomic datasets with different sample sizes. We propose a novel index for identifying genes with a common effect across heterogeneous genomic studies designed to remain stable whatever the sample size and which has a straightforward interpretation in terms of the percentage of separability between patients according to their survival times and gene expression measurements.

Results: The simulations results show that the proposed index is not substantially affected by the sample size of the study and the censoring. They also show that its separability performance is higher than indices of predictive accuracy relying on the likelihood function. A simulated example illustrates the good operating characteristics of our index. In addition, we demonstrate that it is linked to the score statistic and possesses a biologically relevant interpretation.The practical use of the index is illustrated for identifying genes with common effects across eight independent genomic cancer studies of different sample sizes. The meta-selection allows the identification of four genes (ESPL1, KIF4A, HJURP, LRIG1) that are biologically relevant to the carcinogenesis process and have a prognostic impact on survival outcome across various solid tumors.

Conclusion: The proposed index is a promising tool for identifying factors having a prognostic impact across a collection of heterogeneous genomic datasets of various sizes.

Publication types

  • Research Support, Non-U.S. Gov't

MeSH terms

  • Computer Simulation
  • Databases, Genetic
  • Gene Expression Profiling / methods
  • Genome, Human*
  • Humans
  • Likelihood Functions
  • Neoplasms / diagnosis
  • Neoplasms / genetics*
  • Oligonucleotide Array Sequence Analysis
  • Prognosis
  • Sample Size