. 2017 Nov 1;161:149-170.
Epub 2017 Aug 18.
Harmonization of Multi-Site Diffusion Tensor Imaging Data
Free PMC article
Item in Clipboard
Harmonization of Multi-Site Diffusion Tensor Imaging Data
Free PMC article
Diffusion tensor imaging (DTI) is a well-established magnetic resonance imaging (MRI) technique used for studying microstructural changes in the white matter. As with many other imaging modalities, DTI images suffer from technical between-scanner variation that hinders comparisons of images across imaging sites, scanners and over time. Using fractional anisotropy (FA) and mean diffusivity (MD) maps of 205 healthy participants acquired on two different scanners, we show that the DTI measurements are highly site-specific, highlighting the need of correcting for site effects before performing downstream statistical analyses. We first show evidence that combining DTI data from multiple sites, without harmonization, may be counter-productive and negatively impacts the inference. Then, we propose and compare several harmonization approaches for DTI data, and show that ComBat, a popular batch-effect correction tool used in genomics, performs best at modeling and removing the unwanted inter-site variability in FA and MD maps. Using age as a biological phenotype of interest, we show that ComBat both preserves biological variability and removes the unwanted variation introduced by site. Finally, we assess the different harmonization methods in the presence of different levels of confounding between site and age, in addition to test robustness to small sample size studies.
ComBat; DTI; Diffusion; Harmonization; Inter-scanner; Multi-site.
Copyright © 2017 Elsevier Inc. All rights reserved.
Conflict of interest statement
The authors declare that they have no competing interests.
Figure A.1. CAT plots for confounded and matched subsamples
For each confounding scenario, the solid lines represent represent the CAT curves for subsamples of the full dataset that are matched for age across the two sites (no confounding). The sample size of the subsamples are similar to the sample sizes of the confounded subsamples. The CAT curves for the confounded subsamples are represented by the dotted lines, and correspond to the CAT curves described in the Results section of the manuscript. For the first row, the validation dataset was the Independent dataset 1 (similar age range). For the second row, the validation dataset was the Independent dataset 2, with a slightly older age range.
Figure A.2. Stability analysis of the ComBat harmonization
(a) The dotted lines epresent the average site effects estimated by Combat for each of the site, for each subsample size, averaged across the B subsamples. The shaded areas depict 95% confidence intervals. The solid lines represent the site effects estimated by ComBat on the full dataset (m=210). (b) The root mean square error (RMSE) between (1) ComBat-corrected FA values using site effects estimated on subsamples of size m and (2) The ComBat-corrected FA values using site effects estimated on the full sample (m=210).
Figure B.1. RAVEL harmonization
(a) Relationship between the average FA measure in white matter (WM) and cerebrospinal fluid (CSF). The FA measurements vary by site in both WM and CSF. (b) Voxel-specific RAVEL coefficient ψ̂ in template space for FA maps. v (c) Relationship between the average MD measure in white matter (WM) and cerebrospinal fluid (CSF). The MD measurements vary by site in WM, but do not seem to vary in CSF. (d) Voxel-specific RAVEL coefficient ψ̂ in template space for MD maps. v
Figure B.2. Discovery-validation scheme for the estimation of replicability
To estimate the performance of a harmonization procedure at improving the replicability of the voxels associated with age, we use the harmonized dataset as a discovery cohort, and an independent dataset (different participants) as a validation cohort. For each cohort separately, we perform a mass-univariate analysis for age to obtain a t-statistic at each voxel. This yields two vectors of t-statistics,
t and dis t, for the discovery and validation cohorts respectively. We calculate the agreement between val t and dis t using the concordance at the top (CAT) curve, described in the Methods section. A harmonization method that performs better will yield a vector val t more similar to dis t, that is a CAT curve closer to 1. val
Figure B.3. Confounding scenarios for FA maps
Same as Figure 5, but for the per-scan median FA value in the White Matter (WM).
Figure B.4. MA-plots for site differences in MD maps
Same as Figure 3, but for MD maps.
Figure B.5. MA-plots for site differences in AD maps
Same as Figure 3, but for AD maps.
Figure B.6. MA-plots for site differences in RD maps
Same as Figure 3, but for RD maps.
Figure B.7. Number of ROIs associated with site and age
Same as Figure 4, but for the 156 regions of interest (ROIs). All p-values were adjusted for multiple comparisons in a conservative manner using Bonferroni correction.
(a) In the absence of harmonization (raw data), all 156 ROIs are associated with site in the FA maps, and 140 ROIs are associated with site in the MD maps. Both SVA and ComBat result in 0 ROI associated with site. (b) ComBat performs well at increasing the number of ROIs associated with age (92 ROIs for FA and 92 ROIs for MD), as opposed to 8 ROIs and 72 ROIs in the raw data, for the FA and MD maps respectively.
Figure B.8. Percentage of voxels associated with site and age for AD and RD maps
Same as Figure 4, but for the AD and RD maps.
Figure B.9. Effect of ComBat harmonization on t-statistics (MD maps)
Same as Figure 8, but for the MD maps.
Figure B.10. Distribution of the effect sizes for the silver-standards
Figure B.11. Estimated effect sizes Δ̂
ageMD for different confounding scenarios
Same as Figure 7, but for MD.
Figure 1. ComBat site effect parameters for FA
(a) The voxel-wise estimates of the location parameter γ for site 1 (dotted grey line) and site 2 (dotted red line) for the FA maps. The solid lines represent the prior distributions (normal distributions with mean iv γ̄ 1 and γ̄ 2 respectively) estimated in the ComBat procedure using empirical Bayes. (b) The voxel-wise estimates of the scale parameter δ for site 1 (dotted grey line) and site 2 (dotted red line). The solid lines represent the EB-based prior distributions (inverse gamma distributions) estimated in the ComBat procedure. iv (c) Final EB-estimates for the site effects parameters for site 1 (first and third row) and site 2 (second and fourth row) in template space.
Figure 2. FA and MD maps are affected by site
(a) Density of the FA values for WM voxels for each participant, colored by site. (b) MA-plot for site differences in FA. The y-axis represents the differences in FA between Site 1 and Site 2, while the x-axis shows the average FA across sites. FA maps that would be free of site effects would result in an MA-plot centered around 0. The upper-left part of the scatterplot shows that several voxels appear to be differently affected by site in comparison to the rest of the voxels. (c) Boxplot of FA values for voxels located in two regions of interest (Cuneus left and Putamen left), depicted per site (FA values were averaged by site at each voxel separately). This shows that the magnitude of the difference in means between the two sites is region-specific. (d–f) Same as (a–c), but for the MD maps.
Figure 3. MA-plots for site differences in FA maps
Mean-difference (MA) plot for the FA maps for the different harmonization methods. At each voxel in the WM, the y-axis represents the difference between the average FA value at site 1 and the average FA value at site 2, and the x-axis represents the average FA value across all participants from both sites. A dataset free of site effects will result in MA data points near
y = 0 for all values of x.
Figure 4. Percentage of voxels associated with site and age
(a) For each harmonization method, we calculated the number of voxels in the white matter (WM) that are significantly associated with imaging site for both FA and MD. A voxel is significant if the p-value calculated from a two-sample t-test is less than p < 0.05, after adjusting for multiple comparisons using Bonferroni correction. Lower numbers are desirable. (b) Number of voxels in the WM that are significantly associated with age using simple linear regression ( p < 0.05) for both FA and MD. Higher numbers are desirable. From a total of 69,693 voxels in the WM, 69,475 and 40,056 voxels are associated with site in the raw data, for the FA and MD maps respectively. Both SVA and ComBat successfully remove the association with site for all voxels. ComBat performs the best at increasing the number of voxels associated with age (5,658 voxels for FA and 32,203 voxels for MD).
Figure 5. Confounding scenarios for FA maps
In all four panels, each data point represents the FA value versus the age of the participant for a fixed voxel in the right thalamus. Full dots and circles are used to distinguish the two sites of the participant scans (Dataset 1 and Dataset 2). The solid black line in all panels represents the estimated linear relationship between FA and age when all data points are included (absence of confounding). In panel (a), the grey lines represent the estimated relationship between FA and age for each site. In panels (b–d), the selected participants are colored (blue, red and green respectively), and the colored solid lines represent the estimated linear relationship between FA and age for the selected participants only.
Figure 6. Replicability of the voxels associated with age in the FA maps
For each confounding scenario and for each harmonization method, we calculated a concordance at the top (CAT) curve for the voxels associated with age. The concordances were calculated between the harmonized dataset (2 sites combined) and an independent dataset. In
(a), 292 unrelated participants within the same range were selected as an independent cohort. In (b), 105 unrelated and older participants were selected as an independent cohort. A good harmonization will result in a CAT curve closer to 1. Overlaps by chance will result in a CAT curve along the diagonal.
Figure 7. Estimated effect sizes Δ̂
ageFA for different confounding scenarios
(a) Boxplots of the estimated effect sizes Δ̂ ageFA for the set of signal voxels described in Section 3.8, for different confounding scenarios: positive confounding (pos), no confounding (no), negative confounding (neg) and quantitative confounding (rev). The dotted line represents the median true effect size (around 0.004). (b) Boxplots of the estimated effect sizes Δ̂ ageFA for the set of null voxels described in Section 3.8. The median true effect size is around 0. The distributions of the estimated effect sizes for the ComBat-harmonized datasets approximate very well the distribution of the true effect sizes shown in the last column in each panel. Results for MD values are presented in Figure B.11.
Figure 8. ComBat improves statistical power
We present voxel-wise t-statistics in the WM, testing for association between FA values and age, for four combinations of the data: Dataset 1 and Dataset 2 analyzed separately, Dataset 1 and Dataset 2 combined without any harmonization, and Dataset 1 and Dataset 2 combined and harmonized with ComBat.
(a) Distribution of the t-statistics for all WM voxels, for each analyzed dataset. The combined datasets harmonized with ComBat show higher t-statistics. (b) T-statistics in template space for the combined dataset, with no harmonization (top row) and with Combat (bottom row). (c) Distribution of the t-statistics for a subset of voxels highly associated with age (signal silver-standard described in Section 2.5). (d) Distribution of the t-statistics for a set of voxels not associated with age (null silver-standard described in Section 2.5). ComBat increases the magnitude of the t-statistics for the signal voxels while maintaining the t-statistics around 0 for the null voxels. (e) Number of voxels significantly associated with age. Bonferroni correction was applied to correct for multiple comparisons.
Figure 9. ComBat is robust to small sample size studies
B = 100 random subsets of size 20, selecting at random 10 participants from each site, and applied each harmonization method on every subset separately. For each harmonized subset, we computed a t-statistic at each voxel in the WM, testing for the association of FA and MD with age. We created a silver-standard list of t-statistics by creating B = 100 random subsets of size 20 within site. (a) Average concordance at the top (CAT) curve for each harmonization method for the FA maps. The silver-standard CAT curve is depicted in dark blue. A higher curve represents better replicability of the voxels associated with age. (b) Densities of the t-statistics for the set of signal voxels described in Section 3.8, for the FA maps. Higher values of the t-statistics are desirable. (c) Densities of the t-statistics for the set of null voxels described in Section 3.8, for the FA maps. T-statistics closer to 0 are desirable. For each plot, the results obtained for the ComBat-harmonized datasets approximate very well the results obtained from the within-site silver-standard (dark blue). (d) Same as (a), but for the MD maps. (e) Same as (b), but for the MD maps. Lower values of the t-statistics are desirable. (f) Same as (c), but for the MD maps. RAVEL performs substantially worse than other methods.
All figures (22)
Harmonization of Brain Diffusion MRI: Concepts and Methods.
Front Neurosci. 2020 May 6;14:396. doi: 10.3389/fnins.2020.00396. eCollection 2020.
Front Neurosci. 2020.
32435181 Free PMC article.
Age-Related Changes of Peak Width Skeletonized Mean Diffusivity (PSMD) Across the Adult Lifespan: A Multi-Cohort Study.
Front Psychiatry. 2020 May 4;11:342. doi: 10.3389/fpsyt.2020.00342. eCollection 2020.
Front Psychiatry. 2020.
32425831 Free PMC article.
Brain function and clinical characterization in the Boston adolescent neuroimaging of depression and anxiety study.
Neuroimage Clin. 2020 Mar 12;27:102240. doi: 10.1016/j.nicl.2020.102240. Online ahead of print.
Neuroimage Clin. 2020.
32361633 Free PMC article.
Development and Validation of the Automated Imaging Differentiation in Parkinsonism (AID-P): A Multi-Site Machine Learning Study.
Lancet Digit Health. 2019 Sep;1(5):e222-e231. doi: 10.1016/s2589-7500(19)30105-0. Epub 2019 Aug 27.
Lancet Digit Health. 2019.
The cerebellum is associated with 2-year prognosis in patients with high-frequency migraine.
J Headache Pain. 2020 Mar 18;21(1):29. doi: 10.1186/s10194-020-01096-4.
J Headache Pain. 2020.
32188423 Free PMC article.
Research Support, N.I.H., Extramural
Autism Spectrum Disorder / diagnostic imaging*
Diffusion Tensor Imaging / methods*
Diffusion Tensor Imaging / standards
Image Processing, Computer-Assisted / methods*
Image Processing, Computer-Assisted / standards
Multicenter Studies as Topic / methods*
Multicenter Studies as Topic / standards
White Matter / diagnostic imaging*
LinkOut - more resources
Full Text Sources Medical Research Materials