Heterogeneity of performance of screening tools in different patient groups has rarely been considered in the literature on depression screening in primary care. The objectives of the present study were to assess and to compare diagnostic accuracy of three screening questionnaires (Brief Patient Health Questionnaire, General Health Questionnaire-12, WHO-5) in identifying depression across various patient subpopulations and to assess the accuracy of the unaided clinical assessment of primary care physicians in the same subgroups. We conducted a cross-sectional validation study in 448 primary care patients. Two-by-two tables as well as receiver operating characteristics were applied. Results indicated that diagnostic accuracy (sensitivity, specificity) of the three screening instruments as well as of the clinical diagnoses differed in the various patient groups. Superiority of one screening tool over the other depends on the subgroup considered. Gender, age, form (subtype), and severity of depression influence the test characteristics of a screening tool. This should be considered if routine depression screening should be widely introduced. Of course, the benefit of routine screening also depends on efforts made for treatment and monitoring of patients in whom depression was diagnosed.