Nearest-neighbor Projected-Distance Regression (NPDR) is a feature selection technique that uses nearest-neighbors in high dimensional data to detect complex multivariate effects including epistasis. NPDR uses a regression formalism that allows statistical significance testing and efficient control for multiple testing. In addition, the regression formalism provides a mechanism for NPDR to adjust for population structure, which we apply to a GWAS of systemic lupus erythematosus (SLE). We also test NPDR on benchmark simulated genetic variant data with epistatic effects, main effects, imbalanced data for case-control design and continuous outcomes. NPDR identifies potential interactions in an epistasis network that influences the SLE disorder.
Keywords: GWAS; epistasis; feature selection; machine learning; nearest-neighbors.
Copyright © 2020 Arabnejad, Montgomery, Gaffney and McKinney.