Automated gray matter segmentation of magnetic resonance imaging data is essential for morphometric analyses of the brain, particularly when large sample sizes are investigated. However, although detection of small structural brain differences may fundamentally depend on the method used, both accuracy and reliability of different automated segmentation algorithms have rarely been compared. Here, performance of the segmentation algorithms provided by SPM8, VBM8, FSL and FreeSurfer was quantified on simulated and real magnetic resonance imaging data. First, accuracy was assessed by comparing segmentations of twenty simulated and 18 real T1 images with corresponding ground truth images. Second, reliability was determined in ten T1 images from the same subject and in ten T1 images of different subjects scanned twice. Third, the impact of preprocessing steps on segmentation accuracy was investigated. VBM8 showed a very high accuracy and a very high reliability. FSL achieved the highest accuracy but demonstrated poor reliability and FreeSurfer showed the lowest accuracy, but high reliability. An universally valid recommendation on how to implement morphometric analyses is not warranted due to the vast number of scanning and analysis parameters. However, our analysis suggests that researchers can optimize their individual processing procedures with respect to final segmentation quality and exemplifies adequate performance criteria.