Skip to main page content
Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
. 2007 Nov;22(11):1596-602.
doi: 10.1007/s11606-007-0333-y. Epub 2007 Sep 14.

Screening for Depression in Medical Settings With the Patient Health Questionnaire (PHQ): A Diagnostic Meta-Analysis

Free PMC article

Screening for Depression in Medical Settings With the Patient Health Questionnaire (PHQ): A Diagnostic Meta-Analysis

Simon Gilbody et al. J Gen Intern Med. .
Free PMC article


Objective: To summarize the psychometric properties of the PHQ2 and PHQ9 as screening instruments for depression.

Interventions: We identified 17 validation studies conducted in primary care; medical outpatients; and specialist medical services (cardiology, gynecology, stroke, dermatology, head injury, and otolaryngology). Electronic databases from 1994 to February 2007 (MEDLINE, PsycLIT, EMBASE, CINAHL, Cochrane registers) plus study reference lists have been used for this study. Translations included US English, Dutch, Italian, Spanish, German and Arabic). Summary sensitivity, specificity, likelihood and diagnostic odds ratios (OR) against a gold standard (DSM-IV) Major Depressive Disorder (MDD) were calculated for each study. We used random effects bivariate meta-analysis at recommended cut points to produce summary receiver-operator characteristic (sROC) curves. We explored heterogeneity with metaregression.

Measurements and main results: Fourteen studies (5,026 participants) validated the PHQ9 against MDD: sensitivity = 0.80 (95% CI 0.71-0.87); specificity = 0.92 (95% CI 0.88-0.95); positive likelihood ratio = 10.12 (95% CI 6.52-15.67); negative likelihood ratio = 0.22 (0.15 to 0.32). There was substantial heterogeneity (Diagnostic Odds Ratio heterogeneity I2 = 82%), which was not explained by study setting (primary care versus general hospital); method of scoring (cutoff > or = 10 versus "diagnostic algorithm"); or study quality (blinded versus unblinded). The diagnostic validity of the PHQ2 was only validated in 3 studies and showed wide variability in sensitivity.

Conclusions: The PHQ9 is acceptable, and as good as longer clinician-administered instruments in a range of settings, countries, and populations. More research is needed to validate the PHQ2 to see if its diagnostic properties approach those of the PHQ9.


Figure 1
Figure 1
PHQ9 summary ROC plot of diagnosis of major depressive disorder at cutoff ≥10 or by “diagnostic algorithm.” Pooled co-distribution of sensitivity and specificity using a bivariate meta-analysis. Individual point estimates represent single studies, with size of circle proportionate to study sample size.

Similar articles

See all similar articles

Cited by 275 articles

See all "Cited by" articles

Publication types