In cancer research, high-throughput profiling studies have been extensively conducted, searching for genes/single nucleotide polymorphisms (SNPs) associated with prognosis. Despite seemingly significant differences, different subtypes of the same cancer (or different types of cancers) may share common susceptibility genes. In this study, we analyze prognosis data on multiple subtypes of the same cancer but note that the proposed approach is directly applicable to the analysis of data on multiple types of cancers. We describe the genetic basis of multiple subtypes using the heterogeneity model that allows overlapping but different sets of susceptibility genes/SNPs for different subtypes. An accelerated failure time (AFT) model is adopted to describe prognosis. We develop a regularized gradient descent approach that conducts gene-level analysis and identifies genes that contain important SNPs associated with prognosis. The proposed approach belongs to the family of gradient descent approaches, is intuitively reasonable, and has affordable computational cost. Simulation study shows that when prognosis-associated SNPs are clustered in a small number of genes, the proposed approach outperforms alternatives with significantly more true positives and fewer false positives. We analyze an NHL (non-Hodgkin lymphoma) prognosis study with SNP measurements and identify genes associated with the three major subtypes of NHL, namely, DLBCL, FL, and CLL/SLL. The proposed approach identifies genes different from using alternative approaches and has the best prediction performance.
Keywords: NHL; SNP; cancer prognosis; gradient descent; integrative analysis.
© 2012 Wiley Periodicals, Inc.