Mitigating site effects in covariance for machine learning in neuroimaging data
- PMID: 34904312
- PMCID: PMC8837590
- DOI: 10.1002/hbm.25688
Mitigating site effects in covariance for machine learning in neuroimaging data
Abstract
To acquire larger samples for answering complex questions in neuroscience, researchers have increasingly turned to multi-site neuroimaging studies. However, these studies are hindered by differences in images acquired across multiple sites. These effects have been shown to bias comparison between sites, mask biologically meaningful associations, and even introduce spurious associations. To address this, the field has focused on harmonizing data by removing site-related effects in the mean and variance of measurements. Contemporaneously with the increase in popularity of multi-center imaging, the use of machine learning (ML) in neuroimaging has also become commonplace. These approaches have been shown to provide improved sensitivity, specificity, and power due to their modeling the joint relationship across measurements in the brain. In this work, we demonstrate that methods for removing site effects in mean and variance may not be sufficient for ML. This stems from the fact that such methods fail to address how correlations between measurements can vary across sites. Data from the Alzheimer's Disease Neuroimaging Initiative is used to show that considerable differences in covariance exist across sites and that popular harmonization techniques do not address this issue. We then propose a novel harmonization method called Correcting Covariance Batch Effects (CovBat) that removes site effects in mean, variance, and covariance. We apply CovBat and show that within-site correlation matrices are successfully harmonized. Furthermore, we find that ML methods are unable to distinguish scanner manufacturer after our proposed harmonization is applied, and that the CovBat-harmonized data retain accurate prediction of disease group.
Keywords: ComBat; cortical thickness; covariance; harmonization; multi-site analysis; site effect.
© 2021 The Authors. Human Brain Mapping published by Wiley Periodicals LLC.
Conflict of interest statement
The authors declare no potential conflict of interest.
Figures
Similar articles
-
Harmonization of cortical thickness measurements across scanners and sites.Neuroimage. 2018 Feb 15;167:104-120. doi: 10.1016/j.neuroimage.2017.11.024. Epub 2017 Nov 17. Neuroimage. 2018. PMID: 29155184 Free PMC article.
-
Longitudinal ComBat: A method for harmonizing longitudinal multi-scanner imaging data.Neuroimage. 2020 Oct 15;220:117129. doi: 10.1016/j.neuroimage.2020.117129. Epub 2020 Jul 5. Neuroimage. 2020. PMID: 32640273 Free PMC article.
-
Harmonization of multi-site diffusion tensor imaging data.Neuroimage. 2017 Nov 1;161:149-170. doi: 10.1016/j.neuroimage.2017.08.047. Epub 2017 Aug 18. Neuroimage. 2017. PMID: 28826946 Free PMC article.
-
Deep Learning in Large and Multi-Site Structural Brain MR Imaging Datasets.Front Neuroinform. 2022 Jan 20;15:805669. doi: 10.3389/fninf.2021.805669. eCollection 2021. Front Neuroinform. 2022. PMID: 35126080 Free PMC article. Review.
-
A survey on machine and statistical learning for longitudinal analysis of neuroimaging data in Alzheimer's disease.Comput Methods Programs Biomed. 2020 Jun;189:105348. doi: 10.1016/j.cmpb.2020.105348. Epub 2020 Jan 20. Comput Methods Programs Biomed. 2020. PMID: 31995745 Review.
Cited by
-
DeepResBat: deep residual batch harmonization accounting for covariate distribution differences.bioRxiv [Preprint]. 2024 Jan 19:2024.01.18.574145. doi: 10.1101/2024.01.18.574145. bioRxiv. 2024. PMID: 38293022 Free PMC article. Preprint.
-
Efficacy of MRI data harmonization in the age of machine learning: a multicenter study across 36 datasets.Sci Data. 2024 Jan 23;11(1):115. doi: 10.1038/s41597-023-02421-7. Sci Data. 2024. PMID: 38263181 Free PMC article.
-
Neural signatures of emotion regulation.Sci Rep. 2024 Jan 20;14(1):1775. doi: 10.1038/s41598-024-52203-3. Sci Rep. 2024. PMID: 38245590 Free PMC article.
-
Source-based morphometry reveals structural brain pattern abnormalities in 22q11.2 deletion syndrome.Hum Brain Mapp. 2024 Jan;45(1):e26553. doi: 10.1002/hbm.26553. Hum Brain Mapp. 2024. PMID: 38224541 Free PMC article.
-
SAN: mitigating spatial covariance heterogeneity in cortical thickness data collected from multiple scanners or sites.bioRxiv [Preprint]. 2023 Dec 7:2023.12.04.569619. doi: 10.1101/2023.12.04.569619. bioRxiv. 2023. PMID: 38105933 Free PMC article. Preprint.
References
-
- Avants, B. , Klein, A. , Tustison, N. , Woo, J. & Gee, J. C. (2010). Evaluation of open‐access, automated brain extraction methods on multi‐site multi‐disorder data. 16th Annual Meeting for the Organization of Human Brain Mapping.
-
- Bartlett, E. A. , DeLorenzo, C. , Sharma, P. , Yang, J. , Zhang, M. , Petkova, E. , … Parsey, R. V. (2018). Pretreatment and early‐treatment cortical thickness is associated with SSRI treatment response in major depressive disorder. Neuropsychopharmacology, 43(11), 2221–2230. 10.1038/s41386-018-0122-9 - DOI - PMC - PubMed
-
- Boik, R. J. (2002). Spectral models for covariance matrices. Biometrika, 89(1), 159–182.
Publication types
MeSH terms
Grants and funding
LinkOut - more resources
Full Text Sources
