Purpose: To compare the accuracy and repeatability of emerging machine learning based (i.e. deep) automatic segmentation algorithms with those of well-established semi-automatic (interactive) methods for determining liver volume in living liver transplant donors at computerized tomography (CT) imaging.
Methods: A total of 12 (6 semi-, 6 full-automatic) methods are evaluated. The semi-automatic segmentation algorithms are based on both traditional iterative models including watershed, fast marching, region growing, active contours and modern techniques including robust statistical segmenter and super-pixels. These methods entail some sort of interaction mechanism such as placing initialization seeds on images or determining a parameter range. The automatic methods are based on deep learning and they include three framework templates (DeepMedic, NiftyNet and U-Net) the first two of which are applied with default parameter sets and the last two involve adapted novel model designs. For 20 living donors (6 training and 12 test datasets), a group of imaging scientists and radiologists created ground truths by performing manual segmentations on contrast material-enhanced CT images. Each segmentation is evaluated using five metrics (i.e. volume overlap and relative volume errors, average/RMS/maximum symmetrical surface distances). The results are mapped to a scoring system and a final grade is calculated by taking their average. Accuracy and repeatability were evaluated using slice by slice comparisons and volumetric analysis. Diversity and complementarity are observed through heatmaps. Majority voting and Simultaneous Truth and Performance Level Estimation (STAPLE) algorithms are utilized to obtain the fusion of the individual results.
Results: The top four methods are determined to be automatic deep models having 79.63, 79.46 and 77.15 and 74.50 scores. Intra-user score is determined as 95.14. Overall, deep automatic segmentation outperformed interactive techniques on all metrics. The mean volume of liver of ground truth is found to be 1409.93 mL ± 271.28 mL, while it is calculated as 1342.21 mL ± 231.24 mL using automatic and 1201.26 mL ± 258.13 mL using interactive methods, showing higher accuracy and less variation on behalf of automatic methods. The qualitative analysis of segmentation results showed significant diversity and complementarity enabling the idea of using ensembles to obtain superior results. The fusion of automatic methods reached 83.87 with majority voting and 86.20 using STAPLE that are only slightly less than fusion of all methods that achieved 86.70 (majority voting) and 88.74 (STAPLE).
Conclusion: Use of the new deep learning based automatic segmentation algorithms substantially increases the accuracy and repeatability for segmentation and volumetric measurements of liver. Fusion of automatic methods based on ensemble approaches exhibits best results almost without any additional time cost due to potential parallel execution of multiple models.