Predictive analytics with electronic health record (EHR) data holds promise for improving outcomes of psychiatric care. This study evaluated models for predicting outcomes of psychotherapy for depression in a clinical practice setting. EHR data from two large integrated health systems (Kaiser Permanente Colorado and Washington) included 5,554 new psychotherapy episodes with a baseline Patient Health Questionnaire (PHQ-9) score ≥ 10 and a follow-up PHQ-9 14-180 days after treatment initiation. Baseline predictors included demographics and diagnostic, medication, and encounter history. Prediction models for two outcomes-follow-up PHQ-9 score and treatment response (≥ 50% PHQ-9 reduction)-were trained in a random sample of 70% of episodes and validated in the remaining 30%. Two methods were used for modeling: generalized linear regression models with variable selection and random forests. Sensitivity analyses considered alternate predictor, outcome, and model specifications. Predictions of follow-up PHQ-9 scores poorly estimated observed outcomes (mean squared error = 31 for linear regression, 40 for random forest). Predictions of treatment response had low discrimination (AUC = 0.57 for logistic regression, 0.61 for random forest), low classification accuracy, and poor calibration. Sensitivity analyses showed similar results. We note that prediction model performance may vary for settings with different care or EHR documentation practices. In conclusion, prediction models did not accurately predict depression treatment outcomes despite using rich EHR data and advanced analytic techniques. Health systems should proceed cautiously when considering prediction models for psychiatric outcomes using baseline intake information. Transparent research should be conducted to evaluate performance of any model intended for clinical use.
Keywords: Depression; Machine learning; Measurement-based care; Patient-reported outcomes; Prediction; Quality measures.