Improving mass-univariate analysis of neuroimaging data by modelling important unknown covariates: Application to Epigenome-Wide Association Studies

Neuroimage. 2018 Jun;173:57-71. doi: 10.1016/j.neuroimage.2018.01.073. Epub 2018 Feb 12.


Statistical inference on neuroimaging data is often conducted using a mass-univariate model, equivalent to fitting a linear model at every voxel with a known set of covariates. Due to the large number of linear models, it is challenging to check if the selection of covariates is appropriate and to modify this selection adequately. The use of standard diagnostics, such as residual plotting, is clearly not practical for neuroimaging data. However, the selection of covariates is crucial for linear regression to ensure valid statistical inference. In particular, the mean model of regression needs to be reasonably well specified. Unfortunately, this issue is often overlooked in the field of neuroimaging. This study aims to adopt the existing Confounder Adjusted Testing and Estimation (CATE) approach and to extend it for use with neuroimaging data. We propose a modification of CATE that can yield valid statistical inferences using Principal Component Analysis (PCA) estimators instead of Maximum Likelihood (ML) estimators. We then propose a non-parametric hypothesis testing procedure that can improve upon parametric testing. Monte Carlo simulations show that the modification of CATE allows for more accurate modelling of neuroimaging data and can in turn yield a better control of False Positive Rate (FPR) and Family-Wise Error Rate (FWER). We demonstrate its application to an Epigenome-Wide Association Study (EWAS) on neonatal brain imaging and umbilical cord DNA methylation data obtained as part of a longitudinal cohort study. Software for this CATE study is freely available at

Keywords: Epigenetics; Neonatal brain; Non-parametric testing; Univariate analysis; Unknown covariates.

Publication types

  • Research Support, Non-U.S. Gov't

MeSH terms

  • Computer Simulation
  • Data Interpretation, Statistical*
  • Genome-Wide Association Study / methods
  • Humans
  • Linear Models
  • Longitudinal Studies
  • Models, Statistical*
  • Neuroimaging / methods*