Automated Cancer Registry Notifications: Validation of a Medical Text Analytics System for Identifying Patients with Cancer from a State-Wide Pathology Repository

AMIA Annu Symp Proc. 2017 Feb 10:2016:964-973. eCollection 2016.

Abstract

The paper assesses the utility of Medtex on automating Cancer Registry notifications from narrative histology and cytology reports from the Queensland state-wide pathology information system. A corpus of 45.3 million pathology HL7 messages (including 119,581 histology and cytology reports) from a Queensland pathology repository for the year of 2009 was analysed by Medtex for cancer notification. Reports analysed by Medtex were consolidated at a patient level and compared against patients with notifiable cancers from the Queensland Oncology Repository (QOR). A stratified random sample of 1,000 patients was manually reviewed by a cancer clinical coder to analyse agreements and discrepancies. Sensitivity of 96.5% (95% confidence interval: 94.5-97.8%), specificity of 96.5% (95.3-97.4%) and positive predictive value of 83.7% (79.6-86.8%) were achieved for identifying cancer notifiable patients. Medtex achieved high sensitivity and specificity across the breadth of cancers, report types, pathology laboratories and pathologists throughout the State of Queensland. The high sensitivity also resulted in the identification of cancer patients that were not found in the QOR. High sensitivity was at the expense of positive predictive value; however, these cases may be considered as lower priority to Cancer Registries as they can be quickly reviewed. Error analysis revealed that system errors tended to be tumour stream dependent. Medtex is proving to be a promising medical text analytic system. High value cancer information can be generated through intelligent data classification and extraction on large volumes of unstructured pathology reports.

Publication types

  • Validation Study

MeSH terms

  • Computer Systems*
  • Humans
  • Laboratories / standards
  • Mandatory Programs
  • Natural Language Processing
  • Neoplasms / pathology*
  • Pathology / classification*
  • Pathology, Clinical
  • Queensland
  • Registries*
  • Sensitivity and Specificity