Natural Language Processing for Automated Quantification of Brain Metastases Reported in Free-Text Radiology Reports

JCO Clin Cancer Inform. 2019 Apr;3:1-9. doi: 10.1200/CCI.18.00138.


Purpose: Although the bulk of patient-generated health data are increasing exponentially, their use is impeded because most data come in unstructured format, namely as free-text clinical reports. A variety of natural language processing (NLP) methods have emerged to automate the processing of free text ranging from statistical to deep learning-based models; however, the optimal approach for medical text analysis remains to be determined. The aim of this study was to provide a head-to-head comparison of novel NLP techniques and inform future studies about their utility for automated medical text analysis.

Patients and methods: Magnetic resonance imaging reports of patients with brain metastases treated in two tertiary centers were retrieved and manually annotated using a binary classification (single metastasis v two or more metastases). Multiple bag-of-words and sequence-based NLP models were developed and compared after randomly splitting the annotated reports into training and test sets in an 80:20 ratio.

Results: A total of 1,479 radiology reports of patients diagnosed with brain metastases were retrieved. The least absolute shrinkage and selection operator (LASSO) regression model demonstrated the best overall performance on the hold-out test set with an area under the receiver operating characteristic curve of 0.92 (95% CI, 0.89 to 0.94), accuracy of 83% (95% CI, 80% to 87%), calibration intercept of -0.06 (95% CI, -0.14 to 0.01), and calibration slope of 1.06 (95% CI, 0.95 to 1.17).

Conclusion: Among various NLP techniques, the bag-of-words approach combined with a LASSO regression model demonstrated the best overall performance in extracting binary outcomes from free-text clinical reports. This study provides a framework for the development of machine learning-based NLP models as well as a clinical vignette of patients diagnosed with brain metastases.

Publication types

  • Research Support, N.I.H., Extramural

MeSH terms

  • Algorithms
  • Brain Neoplasms / diagnostic imaging*
  • Brain Neoplasms / secondary*
  • Electronic Health Records
  • Humans
  • Magnetic Resonance Imaging
  • Medical Informatics / methods*
  • Natural Language Processing*
  • ROC Curve
  • Radiology* / methods
  • Reproducibility of Results
  • Research Report