Natural language processing for abstraction of cancer treatment toxicities: accuracy versus human experts

JAMIA Open. 2020 Dec 5;3(4):513-517. doi: 10.1093/jamiaopen/ooaa064. eCollection 2020 Dec.


Objectives: Expert abstraction of acute toxicities is critical in oncology research but is labor-intensive and variable. We assessed the accuracy of a natural language processing (NLP) pipeline to extract symptoms from clinical notes compared to physicians.

Materials and methods: Two independent reviewers identified present and negated National Cancer Institute Common Terminology Criteria for Adverse Events (CTCAE) v5.0 symptoms from 100 randomly selected notes for on-treatment visits during radiation therapy with adjudication by a third reviewer. A NLP pipeline based on Apache clinical Text Analysis Knowledge Extraction System was developed and used to extract CTCAE terms. Accuracy was assessed by precision, recall, and F1.

Results: The NLP pipeline demonstrated high accuracy for common physician-abstracted symptoms, such as radiation dermatitis (F1 0.88), fatigue (0.85), and nausea (0.88). NLP had poor sensitivity for negated symptoms.

Conclusion: NLP accurately detects a subset of documented present CTCAE symptoms, though is limited for negated symptoms. It may facilitate strategies to more consistently identify toxicities during cancer therapy.

Keywords: cancer; chemoradiation; natural language processing; radiation therapy; toxicity.