Rapid advances in medical imaging Artificial Intelligence (AI) offer unprecedented opportunities for automatic analysis and extraction of data from large imaging collections. Computational demands of such modern AI tools may be difficult to satisfy with the capabilities available on premises. Cloud computing offers the promise of economical access and extreme scalability. Few studies examine the price/performance tradeoffs of using the cloud, in particular for medical image analysis tasks. We investigate the use of cloud-provisioned compute resources for AI-based curation of the National Lung Screening Trial (NLST) Computed Tomography (CT) images available from the National Cancer Institute (NCI) Imaging Data Commons (IDC). We evaluated NCI Cancer Research Data Commons (CRDC) Cloud Resources - Terra (FireCloud) and Seven Bridges-Cancer Genomics Cloud (SB-CGC) platforms - to perform automatic image segmentation with TotalSegmentator and pyradiomics feature extraction for a large cohort containing >126,000 CT volumes from >26,000 patients. Utilizing >21,000 Virtual Machines (VMs) over the course of the computation we completed analysis in under 9 hours, as compared to the estimated 522 days that would be needed on a single workstation. The total cost of utilizing the cloud for this analysis was $1,011.05. Our contributions include: 1) an evaluation of the numerous tradeoffs towards optimizing the use of cloud resources for large-scale image analysis; 2) CloudSegmentator, an open source reproducible implementation of the developed workflows, which can be reused and extended; 3) practical recommendations for utilizing the cloud for large-scale medical image computing tasks. We also share the results of the analysis: the total of 9,565,554 segmentations of the anatomic structures and the accompanying radiomics features in IDC as of release v18.
Keywords: AI; cloud computing; computed tomography; image segmentation; radiomics.