Objectives: Post-imaging mathematical prediction models (MPMs) provide guidance for the management of solid pulmonary nodules by providing a lung cancer risk score from demographic and radiologists-indicated imaging characteristics. We hypothesized calibrating the MPM risk score threshold to a local study cohort would result in improved performance over the original recommended MPM thresholds. We compared the pre- and post-calibration performance of four MPM models and determined if improvement in MPM prediction occurs as nodules are imaged longitudinally.
Materials and methods: A common cohort of 317 individuals with computed tomography-detected, solid nodules (80 malignant, 237 benign) were used to evaluate the MPM performance. We created a web-based application for this study that allows others to easily calibrate thresholds and analyze the performance of MPMs on their local cohort. Thirty patients with repeated imaging were tested for improved performance longitudinally.
Results: Using calibrated thresholds, Mayo Clinic and Brock University (BU) MPMs performed the best (AUC = 0.63, 0.61) compared to the Veteran's Affairs (0.51) and Peking University (0.55). Only BU had consensus with the original MPM threshold; the other calibrated thresholds improved MPM accuracy. No significant improvements in accuracy were found longitudinally between time points.
Conclusions: Calibration to a common cohort can select the best-performing MPM for your institution. Without calibration, BU has the most stable performance in solid nodules ≥ 8 mm but has only moderate potential to refine subjects into appropriate workup. Application of MPM is recommended only at initial evaluation as no increase in accuracy was achieved over time.
Key points: • Post-imaging lung cancer risk mathematical predication models (MPMs) perform poorly on local populations without calibration. • An application is provided to facilitate calibration to new study cohorts: the Mayo Clinic model, the U.S. Department of Veteran's Affairs model, the Brock University model, and the Peking University model. • No significant improvement in risk prediction occurred in nodules with repeated imaging sessions, indicating the potential value of risk prediction application is limited to the initial evaluation.
Keywords: Area under the curve; Logistic models; Lung neoplasms; Risk assessment; Tomography, x-ray computed.