Identifying Patients with Depression Using Free-text Clinical Documents

Stud Health Technol Inform. 2015;216:629-33.


About 1 in 10 adults are reported to exhibit clinical depression and the associated personal, societal, and economic costs are significant. In this study, we applied the MTERMS NLP system and machine learning classification algorithms to identify patients with depression using discharge summaries. Domain experts reviewed both the training and test cases, and classified these cases as depression with a high, intermediate, and low confidence. For depression cases with high confidence, all of the algorithms we tested performed similarly, with MTERMS' knowledge-based decision tree slightly better than the machine learning classifiers, achieving an F-measure of 89.6%. MTERMS also achieved the highest F-measure (70.6%) on intermediate confidence cases. The RIPPER rule learner was the best performing machine learning method, with an F-measure of 70.0%, and a higher precision but lower recall than MTERMS. The proposed NLP-based approach was able to identify a significant portion of the depression cases (about 20%) that were not on the coded diagnosis list.

Publication types

  • Research Support, Non-U.S. Gov't

MeSH terms

  • Boston
  • Data Mining / methods*
  • Decision Support Systems, Clinical / organization & administration*
  • Depression / classification
  • Depression / diagnosis*
  • Diagnosis, Computer-Assisted / methods*
  • Electronic Health Records / classification*
  • Humans
  • Machine Learning
  • Natural Language Processing*
  • Reproducibility of Results
  • Sensitivity and Specificity