Background: Lung cancer is the leading cause of cancer-related death in the United States. Nearly 50% of patients with stages I and II non-small cell lung cancer (NSCLC) will die from recurrent disease despite surgical resection. No reliable clinical or molecular predictors are currently available for identifying those at high risk for developing recurrent disease. As a consequence, it is not possible to select those high-risk patients for more aggressive therapies and assign less aggressive treatments to patients at low risk for recurrence.
Methods and findings: In this study, we applied a meta-analysis of datasets from seven different microarray studies on NSCLC for differentially expressed genes related to survival time (under 2 y and over 5 y). A consensus set of 4,905 genes from these studies was selected, and systematic bias adjustment in the datasets was performed by distance-weighted discrimination (DWD). We identified a gene expression signature consisting of 64 genes that is highly predictive of which stage I lung cancer patients may benefit from more aggressive therapy. Kaplan-Meier analysis of the overall survival of stage I NSCLC patients with the 64-gene expression signature demonstrated that the high- and low-risk groups are significantly different in their overall survival. Of the 64 genes, 11 are related to cancer metastasis (APC, CDH8, IL8RB, LY6D, PCDHGA12, DSP, NID, ENPP2, CCR2, CASP8, and CASP10) and eight are involved in apoptosis (CASP8, CASP10, PIK3R1, BCL2, SON, INHA, PSEN1, and BIK).
Conclusions: Our results indicate that gene expression signatures from several datasets can be reconciled. The resulting signature is useful in predicting survival of stage I NSCLC and might be useful in informing treatment decisions.