Early Colorectal Cancer Detected by Machine Learning Model Using Gender, Age, and Complete Blood Count Data

Dig Dis Sci. 2017 Oct;62(10):2719-2727. doi: 10.1007/s10620-017-4722-8. Epub 2017 Aug 23.

Abstract

Background: Machine learning tools identify patients with blood counts indicating greater likelihood of colorectal cancer and warranting colonoscopy referral.

Aims: To validate a machine learning colorectal cancer detection model on a US community-based insured adult population.

Methods: Eligible colorectal cancer cases (439 females, 461 males) with complete blood counts before diagnosis were identified from Kaiser Permanente Northwest Region's Tumor Registry. Control patients (n = 9108) were randomly selected from KPNW's population who had no cancers, received at ≥1 blood count, had continuous enrollment from 180 days prior to the blood count through 24 months after the count, and were aged 40-89. For each control, one blood count was randomly selected as the pseudo-colorectal cancer diagnosis date for matching to cases, and assigned a "calendar year" based on the count date. For each calendar year, 18 controls were randomly selected to match the general enrollment's 10-year age groups and lengths of continuous enrollment. Prediction performance was evaluated by area under the curve, specificity, and odds ratios.

Results: Area under the receiver operating characteristics curve for detecting colorectal cancer was 0.80 ± 0.01. At 99% specificity, the odds ratio for association of a high-risk detection score with colorectal cancer was 34.7 (95% CI 28.9-40.4). The detection model had the highest accuracy in identifying right-sided colorectal cancers.

Conclusions: ColonFlag® identifies individuals with tenfold higher risk of undiagnosed colorectal cancer at curable stages (0/I/II), flags colorectal tumors 180-360 days prior to usual clinical diagnosis, and is more accurate at identifying right-sided (compared to left-sided) colorectal cancers.

Keywords: Area under receiver operating characteristics curve; Blood cell count; Colonoscopy; Colorectal neoplasms; Hemoglobin; Medical informatics computing.

Publication types

  • Validation Study

MeSH terms

  • Adult
  • Age Factors
  • Aged
  • Aged, 80 and over
  • Algorithms
  • Area Under Curve
  • Blood Cell Count*
  • Colonoscopy
  • Colorectal Neoplasms / blood
  • Colorectal Neoplasms / diagnosis*
  • Colorectal Neoplasms / pathology
  • Data Mining / methods*
  • Diagnosis, Computer-Assisted / methods*
  • Early Detection of Cancer / methods*
  • Female
  • Humans
  • Machine Learning*
  • Male
  • Middle Aged
  • Odds Ratio
  • Predictive Value of Tests
  • ROC Curve
  • Referral and Consultation
  • Registries
  • Reproducibility of Results
  • Risk Factors
  • Sex Factors