Detecting Genetic Risk Factors for Alzheimer's Disease in Whole Genome Sequence Data via Lasso Screening

Proc IEEE Int Symp Biomed Imaging. 2015 Apr:2015:985-989. doi: 10.1109/ISBI.2015.7164036.

Abstract

Genetic factors play a key role in Alzheimer's disease (AD). The Alzheimer's Disease Neuroimaging Initiative (ADNI) whole genome sequence (WGS) data offers new power to investigate mechanisms of AD by combining entire genome sequences with neuroimaging and clinical data. Here we explore the ADNI WGS SNP (single nucleotide polymorphism) data in depth and extract approximately six million valid SNP features. We investigate imaging genetics associations using Lasso regression-a widely used sparse learning technique. To solve the large-scale Lasso problem more efficiently, we employ a highly efficient screening rule for Lasso-called dual polytope projections (DPP)-to remove irrelevant features from the optimization problem. Experiments demonstrate that the DPP can effectively identify irrelevant features and leads to a 400× speedup. This allows us for the first time to run the compute-intensive model selection procedure called stability selection to rank SNPs that may affect the brain and AD risk.

Keywords: Alzheimer's Disease; Lasso; Lasso Screening; Whole Genome Sequence.