Weighted metrics are required when evaluating the performance of prediction models in nested case-control studies

BMC Med Res Methodol. 2024 May 17;24(1):115. doi: 10.1186/s12874-024-02213-6.

Abstract

Background: Nested case-control (NCC) designs are efficient for developing and validating prediction models that use expensive or difficult-to-obtain predictors, especially when the outcome is rare. Previous research has focused on how to develop prediction models in this sampling design, but little attention has been given to model validation in this context. We therefore aimed to systematically characterize the key elements for the correct evaluation of the performance of prediction models in NCC data.

Methods: We proposed how to correctly evaluate prediction models in NCC data, by adjusting performance metrics with sampling weights to account for the NCC sampling. We included in this study the C-index, threshold-based metrics, Observed-to-expected events ratio (O/E ratio), calibration slope, and decision curve analysis. We illustrated the proposed metrics with a validation of the Breast and Ovarian Analysis of Disease Incidence and Carrier Estimation Algorithm (BOADICEA version 5) in data from the population-based Rotterdam study. We compared the metrics obtained in the full cohort with those obtained in NCC datasets sampled from the Rotterdam study, with and without a matched design.

Results: Performance metrics without weight adjustment were biased: the unweighted C-index in NCC datasets was 0.61 (0.58-0.63) for the unmatched design, while the C-index in the full cohort and the weighted C-index in the NCC datasets were similar: 0.65 (0.62-0.69) and 0.65 (0.61-0.69), respectively. The unweighted O/E ratio was 18.38 (17.67-19.06) in the NCC datasets, while it was 1.69 (1.42-1.93) in the full cohort and its weighted version in the NCC datasets was 1.68 (1.53-1.84). Similarly, weighted adjustments of threshold-based metrics and net benefit for decision curves were unbiased estimates of the corresponding metrics in the full cohort, while the corresponding unweighted metrics were biased. In the matched design, the bias of the unweighted metrics was larger, but it could also be compensated by the weight adjustment.

Conclusions: Nested case-control studies are an efficient solution for evaluating the performance of prediction models that use expensive or difficult-to-obtain biomarkers, especially when the outcome is rare, but the performance metrics need to be adjusted to the sampling procedure.

Keywords: Nested case–control study; Prediction model validation; Rare outcomes; Weighted metrics.

MeSH terms

  • Aged
  • Algorithms*
  • Breast Neoplasms
  • Case-Control Studies
  • Female
  • Humans
  • Middle Aged
  • Models, Statistical
  • Ovarian Neoplasms

Grants and funding