Group Feature Screening via the F Statistic

Commun Stat Simul Comput. 2022;51(4):1921-1931. doi: 10.1080/03610918.2019.1691223. Epub 2019 Nov 26.

Abstract

Feature screening is crucial in the analysis of ultrahigh dimensional data, where the number of variables (features) is in an exponential order of the number of observations. In various ultrahigh dimensional data, variables are naturally grouped, giving us a good rationale to develop a screening method using joint effect of multiple variables. In this article, we propose a group screening procedure via the F-test statistic. The proposed method is a direct extension of the original sure independence screening procedure, when the group information is known, for example, from prior knowledge. Under certain regularity conditions, we prove that the proposed group screening procedure possesses the sure screening property that selects all effective groups with a probability approaching one at an exponential rate. We use simulations to demonstrate the advantages of the proposed method and show its application in a genome-wide association study. We conclude that the grouping method is very useful in the analysis of ultrahigh dimensional data, as the optimal F-test can detect true signals with desired properties.

Keywords: Feature screening; Multiple regression; Sure screening property; Ultrahigh dimension.