Large databases of high-resolution structural MR images are being assembled to quantitatively examine the relationships between brain anatomy, disease progression, treatment regimens, and genetic influences upon brain structure. Quantifying brain structures in such large databases cannot be practically accomplished by expert neuroanatomists using hand-tracing. Rather, this research will depend upon automated methods that reliably and accurately segment and quantify dozens of brain regions. At present, there is little guidance available to help clinical research groups in choosing such tools. Thus, our goal was to compare the performance of two popular and fully automated tools, FSL/FIRST and FreeSurfer, to expert hand tracing in the measurement of the hippocampus and amygdala. Volumes derived from each automated measurement were compared to hand tracing for percent volume overlap, percent volume difference, across-sample correlation, and 3-D group-level shape analysis. In addition, sample size estimates for conducting between-group studies were computed for a range of effect sizes. Compared to hand tracing, hippocampal measurements with FreeSurfer exhibited greater volume overlap, smaller volume difference, and higher correlation than FIRST, and sample size estimates with FreeSurfer were closer to hand tracing. Amygdala measurement with FreeSurfer was also more highly correlated to hand tracing than FIRST, but exhibited a greater volume difference than FIRST. Both techniques had comparable volume overlap and similar sample size estimates. Compared to hand tracing, a 3-D shape analysis of the hippocampus showed FreeSurfer was more accurate than FIRST, particularly in the head and tail. However, FIRST more accurately represented the amygdala shape than FreeSurfer, which inflated its anterior and posterior surfaces.