Bridging Big Data: Procedures for Combining Non-equivalent Cognitive Measures from the ENIGMA Consortium

bioRxiv [Preprint]. 2023 Apr 7:2023.01.16.524331. doi: 10.1101/2023.01.16.524331.


Investigators in neuroscience have turned to Big Data to address replication and reliability issues by increasing sample sizes, statistical power, and representativeness of data. These efforts unveil new questions about integrating data arising from distinct sources and instruments. We focus on the most frequently assessed cognitive domain - memory testing - and demonstrate a process for reliable data harmonization across three common measures. We aggregated global raw data from 53 studies totaling N = 10,505 individuals. A mega-analysis was conducted using empirical bayes harmonization to remove site effects, followed by linear models adjusting for common covariates. A continuous item response theory (IRT) model estimated each individual's latent verbal learning ability while accounting for item difficulties. Harmonization significantly reduced inter-site variance while preserving covariate effects, and our conversion tool is freely available online. This demonstrates that large-scale data sharing and harmonization initiatives can address reproducibility and integration challenges across the behavioral sciences.

Keywords: Harmonization; Mega analysis; Tools; Verbal learning.

Publication types

  • Preprint

Grants and funding