Background: DNA methylation levels are known to vary over time, and modelling these trajectories is crucial for our understanding of the biological relevance of these changes over time. However, due to the computational cost of fitting multilevel models across the epigenome, most trajectory modelling efforts to date have focused on a subset of CpG sites identified through epigenome-wide association studies (EWAS) at individual time-points.
Methods: We propose using linear regression across the repeated measures, estimating cluster-robust standard errors using a sandwich estimator, as a less computationally intensive strategy than multilevel modelling. We compared these two longitudinal approaches, as well as three approaches based on EWAS (associated at baseline, at any time-point and at all time-points), for identifying epigenetic change over time related to an exposure using simulations and by applying them to blood DNA methylation profiles from the Accessible Resource for Integrated Epigenomics Studies (ARIES).
Results: Restricting association testing to EWAS at baseline identified a less complete set of associations than performing EWAS at each time-point or applying the longitudinal modelling approaches to the full dataset. Linear regression models with cluster-robust standard errors identified similar sets of associations with almost identical estimates of effect as the multilevel models, while also being 74 times more efficient. Both longitudinal modelling approaches identified comparable sets of CpG sites in ARIES with an association with prenatal exposure to smoking (>70% agreement).
Conclusions: Linear regression with cluster-robust standard errors is an appropriate and efficient approach for longitudinal analysis of DNA methylation data.