Natural language processing for automated detection of incidental durotomy

Spine J. 2020 May;20(5):695-700. doi: 10.1016/j.spinee.2019.12.006. Epub 2019 Dec 23.


Background: Incidental durotomy is a common intraoperative complication during spine surgery with potential implications for postoperative recovery, patient-reported outcomes, length of stay, and costs. To our knowledge, there are no processes available for automated surveillance of incidental durotomy.

Purpose: The purpose of this study was to develop natural language processing (NLP) algorithms for automated detection of incidental durotomies in free-text operative notes of patients undergoing lumbar spine surgery.

Patient sample: Adult patients 18 years or older undergoing lumbar spine surgery between January 1, 2000 and June 31, 2018 at two academic and three community medical centers.

Outcome measures: The primary outcome was defined as intraoperative durotomy recorded in free-text operative notes.

Methods: An 80:20 stratified split was undertaken to create training and testing populations. An extreme gradient-boosting NLP algorithm was developed to detect incidental durotomy. Discrimination was assessed via area under receiver-operating curve (AUC-ROC), precision-recall curve, and Brier score. Performance of this algorithm was compared with current procedural terminology (CPT) and international classification of diseases (ICD) codes for durotomy.

Results: Overall, 1,000 patients were included in the study and 93 (9.3%) had a recorded incidental durotomy in the free-text operative report. In the independent testing set (n=200) not used for model development, the NLP algorithm achieved AUC-ROC of 0.99 for detection of durotomy. In comparison, the CPT/ICD codes had AUC-ROC of 0.64. In the testing set, the NLP algorithm detected 16 of 18 patients with incidental durotomy (sensitivity 0.89) whereas the CPT and ICD codes detected 5 of 18 (sensitivity 0.28). At a threshold of 0.05, the NLP algorithm had specificity of 0.99, positive predictive value of 0.89, and negative predictive value of 0.99.

Conclusions: Internal validation of the NLP algorithm developed in this study indicates promising results for future NLP applications in spine surgery. Pending external validation, the NLP algorithm developed in this study may be used by entities including national spine registries or hospital quality and safety departments to automate tracking of incidental durotomies.

Keywords: Artificial intelligence; Diagnosis; Dural tear; Durotomy; Machine learning; Natural language processing; Prediction; Spine.

MeSH terms

  • Adult
  • Algorithms
  • Humans
  • Intraoperative Complications
  • Natural Language Processing*
  • Neurosurgical Procedures
  • Spine*