Partition-based ultrahigh-dimensional variable screening

Biometrika. 2017 Nov;104(4):785-800. doi: 10.1093/biomet/asx052. Epub 2017 Oct 9.

Abstract

Traditional variable selection methods are compromised by overlooking useful information on covariates with similar functionality or spatial proximity, and by treating each covariate independently. Leveraging prior grouping information on covariates, we propose partition-based screening methods for ultrahigh-dimensional variables in the framework of generalized linear models. We show that partition-based screening exhibits the sure screening property with a vanishing false selection rate, and we propose a data-driven partition screening framework with unavailable or unreliable prior knowledge on covariate grouping and investigate its theoretical properties. We consider two special cases: correlation-guided partitioning and spatial location- guided partitioning. In the absence of a single partition, we propose a theoretically justified strategy for combining statistics from various partitioning methods. The utility of the proposed methods is demonstrated via simulation and analysis of functional neuroimaging data.

Keywords: Correlation-based variable screening; Partition; Spatial variable screening; Ultrahigh-dimensional variable screening.