Comparison of 11 automated PET segmentation methods in lymphoma

Phys Med Biol. 2020 Nov 27;65(23):235019. doi: 10.1088/1361-6560/abb6bd.

Abstract

Segmentation of lymphoma lesions in FDG PET/CT images is critical in both assessing individual lesions and quantifying patient disease burden. Simple thresholding methods remain common despite the large heterogeneity in lymphoma lesion location, size, and contrast. Here, we assess 11 automated PET segmentation methods for their use in two scenarios: individual lesion segmentation and patient-level disease quantification in lymphoma. Lesions on 18F-FDG PET/CT scans of 90 lymphoma patients were contoured by a nuclear medicine physician. Thresholding, active contours, clustering, adaptive region-growing, and convolutional neural network (CNN) methods were implemented on all physician-identified lesions. Lesion-level segmentation was evaluated using multiple segmentation performance metrics (Dice, Hausdorff Distance). Patient-level quantification of total disease burden (SUVtotal) and metabolic tumor volume (MTV) was assessed using Spearman's correlation coefficients between the segmentation output and physician contours. Lesion segmentation and patient quantification performance was compared to inter-physician agreement in a subset of 20 patients segmented by a second nuclear medicine physician. In total, 1223 lesions with median tumor-to-background ratio of 4.0 and volume of 1.8 cm3, were evaluated. When assessed for lesion segmentation, a 3D CNN, DeepMedic, achieved the highest performance across all evaluation metrics. DeepMedic, clustering methods, and an iterative threshold method had lesion-level segmentation performance comparable to the degree of inter-physician agreement. For patient-level SUVtotal and MTV quantification, all methods except 40% and 50% SUVmax and adaptive region-growing achieved a performance that was similar the agreement of the two physicians. Multiple methods, including a 3D CNN, clustering, and an iterative threshold method, achieved both good lesion-level segmentation and patient-level quantification performance in a population of 90 lymphoma patients. These methods are thus recommended over thresholding methods such as 40% and 50% SUVmax, which were consistently found to be significantly outside the limits defined by inter-physician agreement.

Publication types

  • Comparative Study
  • Research Support, Non-U.S. Gov't

MeSH terms

  • Adult
  • Aged
  • Algorithms*
  • Female
  • Fluorodeoxyglucose F18 / metabolism
  • Humans
  • Lymphoma / classification
  • Lymphoma / diagnostic imaging
  • Lymphoma / metabolism
  • Lymphoma / pathology*
  • Male
  • Middle Aged
  • Neural Networks, Computer*
  • Positron Emission Tomography Computed Tomography / methods*
  • Radiopharmaceuticals / metabolism
  • Retrospective Studies
  • Tumor Burden
  • Young Adult

Substances

  • Radiopharmaceuticals
  • Fluorodeoxyglucose F18