Objective: Today's growing medical image databases call for novel processing tools to structure the bulk of data and extract clinically relevant information. Unsupervised hierarchical clustering may reveal clusters within anatomical shape data of patient populations as required for modern precision medicine strategies. Few studies have applied hierarchical clustering techniques to three-dimensional patient shape data and results depend heavily on the chosen clustering distance metrics and linkage functions. In this study, we sought to assess clustering classification performance of various distance/linkage combinations and of different types of input data to obtain clinically meaningful shape clusters.
Methods: We present a processing pipeline combining automatic segmentation, statistical shape modeling, and agglomerative hierarchical clustering to automatically subdivide a set of 60 aortic arch anatomical models into healthy controls, two groups affected by congenital heart disease, and their respective subgroups as defined by clinical diagnosis. Results were compared with traditional morphometrics and principal component analysis of shape features.
Results: Our pipeline achieved automatic division of input shape data according to primary clinical diagnosis with high F-score (0.902 ± 0.042) and Matthews correlation coefficient (0.851 ± 0.064) using the correlation/weighted distance/linkage combination. Meaningful subgroups within the three patient groups were obtained and benchmark scores for automatic segmentation and classification performance are reported.
Conclusion: Clustering results vary depending on the distance/linkage combination used to divide the data. Yet, clinically relevant shape clusters and subgroups could be found with high specificity and low misclassification rates.
Significance: Detecting disease-specific clusters within medical image data could improve image-based risk assessment, treatment planning, and medical device development in complex disease.