Purpose: To compare human observers to a mathematically derived computer model for differentiation between malignant and benign pulmonary nodules detected on baseline screening computed tomography (CT) scans.
Methods: A case-cohort study design was chosen. The study group consisted of 300 chest CT scans from the Danish Lung Cancer Screening Trial (DLCST). It included all scans with proven malignancies (n = 62) and two subsets of randomly selected baseline scans with benign nodules of all sizes (n = 120) and matched in size to the cancers, respectively (n = 118). Eleven observers and the computer model (PanCan) assigned a malignancy probability score to each nodule. Performances were expressed by area under the ROC curve (AUC). Performance differences were tested using the Dorfman, Berbaum and Metz method. Seven observers assessed morphological nodule characteristics using a predefined list. Differences in morphological features between malignant and size-matched benign nodules were analyzed using chi-square analysis with Bonferroni correction. A significant difference was defined at p < 0.004.
Results: Performances of the model and observers were equivalent (AUC 0.932 versus 0.910, p = 0.184) for risk-assessment of malignant and benign nodules of all sizes. However, human readers performed superior to the computer model for differentiating malignant nodules from size-matched benign nodules (AUC 0.819 versus 0.706, p < 0.001). Large variations between observers were seen for ROC areas and ranges of risk scores. Morphological findings indicative of malignancy referred to border characteristics (spiculation, p < 0.001) and perinodular architectural deformation (distortion of surrounding lung parenchyma architecture, p < 0.001; pleural retraction, p = 0.002).
Conclusions: Computer model and human observers perform equivalent for differentiating malignant from randomly selected benign nodules, confirming the high potential of computer models for nodule risk estimation in population based screening studies. However, computer models highly rely on size as discriminator. Incorporation of other morphological criteria used by human observers to superiorly discriminate size-matched malignant from benign nodules, will further improve computer performance.