Elastic Net Constrained Stereotype Logit Model for Ordered Categorical Data

Biom Biostat Int J. 2015;2(7):00049. doi: 10.15406/bbij.2015.02.00049. Epub 2015 Oct 20.

Abstract

Gene expression studies are of growing importance in the field of medicine. In fact, sub-types within the same disease have been shown to have differing gene expression profiles. Often, researchers are interested in differentiating a disease by a categorical classification indicative of disease progression. For example, it may be of interest to identify genes that are associated with progression and to accurately predict the state of progression using gene expression data. One challenge when modeling microarray gene expression data is that there are more genes (variables) than there are observations. In addition, the genes usually demonstrate a complex variance-covariance structure. Therefore, modeling a categorical variable reflecting disease progression using gene expression data presents the need for methods capable of handling an ordinal outcome in the presence of a high dimensional covariate space. We present a method that combines the stereotype regression model with an elastic net penalty as a method capable of modeling an ordinal outcome for high-throughput genomic data sets. Results from the application of the proposed method to gene expression data are reported and the effectiveness of the proposed method is discussed.

Keywords: Affymetrix; Elastic net; High dimensional; Stereotype logit.