Evaluating information loss in the National Cancer Database from cases lost to follow-up

J Surg Oncol. 2022 Nov;126(6):1123-1132. doi: 10.1002/jso.26977. Epub 2022 Aug 27.

Abstract

Background and objectives: Cancer registries must focus on data capture which returns value while reducing resource burden with minimal loss of data. Identifying the optimum length of follow-up data collection for patients with cancer achieves this goal.

Methods: A two-step analysis using entropy calculations to assess information gain for each follow-up year, and second-order differences to compare survival outcomes between the defined follow-up periods and lifetime follow-up. A total of 391 567 adult cases, deidentified in the National Cancer Database and diagnosed in 1989. Comparisons examined a subset of 61 908 lung cancer cases, 48 387 colon and rectal cancer cases, and 64 134 breast cancer cases in adults. A total of 4133 pediatric cases were diagnosed in 1989 examining 1065 leukemia cases and 494 lymphoma cases.

Results: Annual increases in information gain fell below 1% after 16 years of follow-up for adult cases and 9 years for pediatric cases. Comparison of second-order differences showed 62% of the comparisons were similar between 15 years and lifetime follow-up when examining restricted mean survival time. In addition, 90% of the comparisons were statistically similar when comparing hazard ratios.

Conclusions: Survival analysis using 15 years postdiagnosis follow-up showed minimal differences in information gain compared to lifetime follow-up.

Keywords: biostatistics; lost to follow-up; survival rate.

MeSH terms

  • Adult
  • Breast Neoplasms*
  • Child
  • Databases, Factual
  • Female
  • Humans
  • Lost to Follow-Up*
  • Registries
  • Survival Analysis
  • Survival Rate