A high positive predictive value algorithm using hospital administrative data identified incident cancer cases

J Clin Epidemiol. 2008 Apr;61(4):373-9. doi: 10.1016/j.jclinepi.2007.05.017. Epub 2007 Oct 22.


Objective: We have developed and validated an algorithm based on Piedmont hospital discharge abstracts for ascertainment of incident cases of breast, colorectal, and lung cancer.

Study design and setting: The algorithm training and validation sets were based on data from 2000 and 2001, respectively. The validation was carried out at an individual level by linkage of cases identified by the algorithm with cases in the Piedmont Cancer Registry diagnosed in 2001.

Results: The sensitivity of the algorithm was higher for lung cancer (80.8%) than for breast (76.7%) and colorectal (72.4%) cancers. The positive predictive values were 78.7%, 87.9%, and 92.6% for lung, colorectal, and breast cancer, respectively. The high values for colorectal and breast cancers were due to the model's ability to distinguish prevalent from incident cases and to the accuracy of surgery claims for case identification.

Conclusions: Given its moderate sensitivity, this algorithm is not intended to replace cancer registration, but it is a valuable tool to investigate other aspects of cancer surveillance. This method provides a valid study base for timely monitoring cancer practice and related outcomes, geographic and temporal variations, and costs.

Publication types

  • Research Support, Non-U.S. Gov't
  • Validation Study

MeSH terms

  • Algorithms*
  • Breast Neoplasms / epidemiology
  • Colorectal Neoplasms / epidemiology
  • Databases, Factual
  • Female
  • Humans
  • Incidence
  • Insurance, Hospitalization / statistics & numerical data*
  • Italy / epidemiology
  • Lung Neoplasms / epidemiology
  • Male
  • Medical Record Linkage / methods*
  • Neoplasms / epidemiology*
  • Patient Discharge / statistics & numerical data*
  • ROC Curve
  • Registries
  • Sensitivity and Specificity