Prediction of tumour pathological subtype from genomic profile using sparse logistic regression with random effects

J Appl Stat. 2020 Mar 11;48(4):605-622. doi: 10.1080/02664763.2020.1738358. eCollection 2021.


The purpose of this study is to highlight the application of sparse logistic regression models in dealing with prediction of tumour pathological subtypes based on lung cancer patients' genomic information. We consider sparse logistic regression models to deal with the high dimensionality and correlation between genomic regions. In a hierarchical likelihood (HL) method, it is assumed that the random effects follow a normal distribution and its variance is assumed to follow a gamma distribution. This formulation considers ridge and lasso penalties as special cases. We extend the HL penalty to include a ridge penalty (called 'HLnet') in a similar principle of the elastic net penalty, which is constructed from lasso penalty. The results indicate that the HL penalty creates more sparse estimates than lasso penalty with comparable prediction performance, while HLnet and elastic net penalties have the best prediction performance in real data. We illustrate the methods in a lung cancer study.

Keywords: Tumour; hierarchical likelihood; logistic regression; lung cancer; pathological subtype; sparse solution.

Grants and funding

The first author (ÖK) is supported by The Scientific and Technological Research Council of Turkey (TUBITAK), [grant number: 1059B191601904], as part of 2219-International Postdoctoral Research Scholarship Programme.