Predictive modeling for classification of positive valence system symptom severity from initial psychiatric evaluation records

J Biomed Inform. 2017 Nov;75S:S94-S104. doi: 10.1016/j.jbi.2017.05.019. Epub 2017 May 29.


In response to the challenges set forth by the CEGS N-GRID 2016 Shared Task in Clinical Natural Language Processing, we describe a framework to automatically classify initial psychiatric evaluation records to one of four positive valence system severities: absent, mild, moderate, or severe. We used a dataset provided by the event organizers to develop a framework comprised of natural language processing (NLP) modules and 3 predictive models (two decision tree models and one Bayesian network model) used in the competition. We also developed two additional predictive models for comparison purpose. To evaluate our framework, we employed a blind test dataset provided by the 2016 CEGS N-GRID. The predictive scores, measured by the macro averaged-inverse normalized mean absolute error score, from the two decision trees and Naïve Bayes models were 82.56%, 82.18%, and 80.56%, respectively. The proposed framework in this paper can potentially be applied to other predictive tasks for processing initial psychiatric evaluation records, such as predicting 30-day psychiatric readmissions.

Keywords: Computer-assisted diagnosis; Natural language processing; Psychiatry; Research Domain Criteria (RDoC); Supervised machine learning.

MeSH terms

  • Bayes Theorem
  • Humans
  • Models, Psychological*
  • Natural Language Processing
  • Severity of Illness Index