Large-scale longitudinal studies of regional brain volume require reliable quantification using automated segmentation and labeling. However, repeated MR scanning of the same subject, even if using the same scanner and acquisition parameters, does not result in identical images due to small changes in image orientation, changes in prescan parameters, and magnetic field instability. These differences may lead to appreciable changes in estimates of volume for different structures. This study examined scan-rescan reliability of automated segmentation algorithms for measuring several subcortical regions, using both within-day and across-day comparison sessions in a group of 23 normal participants. We found that the reliability of volume measures including percent volume difference, percent volume overlap (Dice's coefficient), and intraclass correlation coefficient (ICC), varied substantially across brain regions. Low reliability was observed in some structures such as the amygdala (ICC = 0.6), with higher reliability (ICC = 0.9) for other structures such as the thalamus and caudate. Patterns of reliability across regions were similar for automated segmentation with FSL/FIRST and FreeSurfer (longitudinal stream). Reliability was associated with the volume of the structure, the ratio of volume to surface area for the structure, the magnitude of the interscan interval, and the method of segmentation. Sample size estimates for detecting changes in brain volume for a range of likely effect sizes also differed by region. Thus, longitudinal research requires a careful analysis of sample size and choice of segmentation method combined with a consideration of the brain structure(s) of interest and the magnitude of the anticipated effects.
© 2010 Wiley-Liss, Inc.