Nearest-Neighbor Projected Distance Regression for Epistasis Detection in GWAS With Population Structure Correction

Front Genet. 2020 Jul 22;11:784. doi: 10.3389/fgene.2020.00784. eCollection 2020.

Abstract

Nearest-neighbor Projected-Distance Regression (NPDR) is a feature selection technique that uses nearest-neighbors in high dimensional data to detect complex multivariate effects including epistasis. NPDR uses a regression formalism that allows statistical significance testing and efficient control for multiple testing. In addition, the regression formalism provides a mechanism for NPDR to adjust for population structure, which we apply to a GWAS of systemic lupus erythematosus (SLE). We also test NPDR on benchmark simulated genetic variant data with epistatic effects, main effects, imbalanced data for case-control design and continuous outcomes. NPDR identifies potential interactions in an epistasis network that influences the SLE disorder.

Keywords: GWAS; epistasis; feature selection; machine learning; nearest-neighbors.