A diagnostic meta-analysis of the Patient Health Questionnaire-9 (PHQ-9) algorithm scoring method as a screen for depression

Gen Hosp Psychiatry. Jan-Feb 2015;37(1):67-75. doi: 10.1016/j.genhosppsych.2014.09.009. Epub 2014 Sep 23.


Background: The depression module of the Patient Health Questionnaire-9 (PHQ-9) is a widely used depression screening instrument in nonpsychiatric settings. The PHQ-9 can be scored using different methods, including an algorithm based on Diagnostic and Statistical Manual of Mental Disorders, Fourth Edition criteria and a cut-off based on summed-item scores. The algorithm was the originally proposed scoring method to screen for depression. We summarized the diagnostic test accuracy of the PHQ-9 using the algorithm scoring method across a range of validation studies and compared the diagnostic properties of the PHQ-9 using the algorithm and summed scoring method at the proposed cut-off point of 10.

Methods: We performed a systematic review of diagnostic accuracy studies of the PHQ-9 using the algorithm scoring method to detect major depressive disorder (MDD). We used meta-analytic methods to calculate summary sensitivity, specificity, likelihood ratios and diagnostic odds ratios for diagnosing MDD of the PHQ-9 using algorithm scoring method. In studies that reported both scoring methods (algorithm and summed-item scoring at proposed cut-off point of ≥10), we compared the diagnostic properties of the PHQ-9 using these methods.

Results: We found 27 validation studies that validated the algorithm scoring method of the PHQ-9 in various settings. There was substantial heterogeneity across studies, which makes the pooled results difficult to interpret. In general, sensitivity was low whereas specificity was good. Thirteen studies reported the diagnostic properties of the PHQ-9 for both scoring methods. Pooled sensitivity for algorithm scoring method was lower while specificities were good for both scoring methods. Heterogeneity was consistently high; therefore, caution should be used when interpreting these results.

Interpretation: This review shows that, if the algorithm scoring method is used, the PHQ-9 has a low sensitivity for detecting MDD. This could be due to the rating scale categories of the measure, higher specificity or other factors that warrant further research. The summed-item score method at proposed cut-off point of ≥10 has better diagnostic performance for screening purposes or where a high sensitivity is needed.

Keywords: Depression; Meta-analysis; Psychometrics; Questionnaire; Screening.

Publication types

  • Meta-Analysis
  • Review
  • Systematic Review

MeSH terms

  • Algorithms*
  • Depressive Disorder, Major / diagnosis*
  • Humans
  • Psychiatric Status Rating Scales / standards*
  • Psychometrics / instrumentation*