Purpose To compare available methods for whole-brain and gray matter (GM) atrophy estimation in multiple sclerosis (MS) in terms of repeatability (same magnetic resonance [MR] imaging unit) and reproducibility (different system/field strength) for their potential clinical applications. Materials and Methods The softwares ANTs-v1.9, CIVET-v2.1, FSL-SIENAX/SIENA-5.0.1, Icometrix-MSmetrix-1.7, and SPM-v12 were compared. This retrospective study, performed between March 2015 and March 2017, collected data from (a) eight simulated MR images and longitudinal data (2 weeks) from 10 healthy control subjects to assess the cross-sectional and longitudinal accuracy of atrophy measures, (b) test-retest MR images in 29 patients with MS acquired within the same day at different imaging unit field strengths/manufacturers to evaluate precision, and (c) longitudinal data (1 year) in 24 patients with MS for the agreement between methods. Tissue segmentation, image registration, and white matter (WM) lesion filling were also evaluated. Multiple paired t tests were used for comparisons. Results High values of accuracy (0.87-0.97) for whole-brain and GM volumes were found, with the lowest values for MSmetrix. ANTs showed the lowest mean error (0.02%) for whole-brain atrophy in healthy control subjects, with a coefficient of variation of 0.5%. SPM showed the smallest mean error (0.07%) and coefficient of variation (0.08%) for GM atrophy. Globally, good repeatability (P > .05) but poor reproducibility (P < .05) were found for all methods. WM lesion filling technique mainly affected ANTs, MSmetrix, and SPM results (P < .05). Conclusion From this comparison, it would be possible to select a software for atrophy measurement, depending on the requirements of the application (research center, clinical trial) and its goal (accuracy and repeatability or reproducibility). An improved reproducibility is required for clinical application. © RSNA, 2018 Online supplemental material is available for this article.