Purpose: To compare manual corrections of liver masks produced by a fully automatic segmentation method based on convolutional neural networks (CNN) with manual routine segmentations in MR images in terms of inter-observer variability and interaction time.
Methods: For testing, patient's precise reference segmentations that fulfill the quality requirements for liver surgery were manually created. One radiologist and two radiology residents were asked to provide manual routine segmentations. We used our automatic segmentation method Liver-Net to produce liver masks for the test cases and asked a radiologist assistant and one further resident to correct the automatic results. All observers were asked to measure their interaction time. Both manual routine and corrected segmentations were compared with the reference annotations.
Results: The manual routine segmentations achieved a mean Dice index of 0.95 and a mean relative error (RVE) of 4.7%. The quality of liver masks produced by the Liver-Net was on average 0.95 Dice and 4.5% RVE. Liver masks resulting from manual corrections of automatically generated segmentations compared to routine results led to a significantly lower inter-observer variability (mean per case absolute RVE difference across observers 0.69%) when compared to manual routine ones (2.75%). The mean interaction time was 2 min for manual corrections and 10 min for manual routine segmentations.
Conclusions: The quality of automatic liver segmentations is on par with those from manual routines. Using automatic liver masks in the clinical workflow could lead to a reduction of segmentation time and a more consistent liver volume estimation across different observers.