Diagnostic Accuracy of the Berlin Questionnaire, STOP-BANG, STOP, and Epworth Sleepiness Scale in Detecting Obstructive Sleep Apnea: A Bivariate Meta-Analysis

Sleep Med Rev. 2017 Dec;36:57-70. doi: 10.1016/j.smrv.2016.10.004. Epub 2016 Nov 5.


Obstructive sleep apnea (OSA) is a highly prevalent sleep disorder; however, it remains underdiagnosed and undertreated. Although screening tools such as the Berlin questionnaire (BQ), STOP-BANG questionnaire (SBQ), STOP questionnaire (STOP), and Epworth sleepiness scale (ESS) are widely used for OSA, the findings regarding their diagnostic accuracy are controversial. Therefore, this meta-analysis investigated and compared the summary sensitivity, specificity, and diagnostic odds ratio (DOR) among the BQ, SBQ, STOP, and ESS according to the severity of OSA. Electronic databases, namely the Embase, PubMed, PsycINFO, ProQuest dissertations and theses A&I databases, and China knowledge resource integrated database, were searched from their inception to July 15, 2016. We included studies examining the sensitivity and specificity of the BQ, SBQ, STOP, and ESS against the apnea-hypopnea index (AHI) or respiratory disturbance index (RDI). The revised quality assessment of diagnostic accuracy studies was used to evaluate the methodological quality of studies. A random-effects bivariate model was used to estimate the summary sensitivity, specificity, and DOR of the tools. We identified 108 studies including a total of 47 989 participants. The summary estimates were calculated for the BQ, SBQ, STOP, and ESS in detecting mild (AHI/RDI ≥ 5 events/h), moderate (AHI/RDI ≥ 15 events/h), and severe OSA (AHI/RDI ≥ 30 events/h). The performance levels of the BQ, SBQ, STOP, and ESS in detecting OSA of various severity levels are outlined as follows: for mild OSA, the pooled sensitivity levels were 76%, 88%, 87%, and 54%; pooled specificity levels were 59%, 42%, 42%, and 65%; and pooled DORs were 4.30, 5.13, 4.85, and 2.18, respectively. For moderate OSA, the pooled sensitivity levels were 77%, 90%, 89%, and 47%; pooled specificity levels were 44%, 36%, 32%, and 621%; and pooled DORs were 2.68, 5.05, 3.71, and 1.45, respectively. For severe OSA, the pooled sensitivity levels were 84%, 93%, 90%, and 58%; pooled specificity levels were 38%, 35%, 28%, and 60%; and pooled DORs were 3.10, 6.51, 3.37, and 2.10, respectively. Therefore, for mild, moderate, and severe OSA, the pooled sensitivity and DOR of the SBQ were significantly higher than those of other screening tools (P < .05); however, the specificity of the SBQ was lower than that of the ESS (P < .05). Moreover, age, sex, body mass index, study sample size, study populations, presence of comorbidities, PSG or portable monitoring performance, and risk of bias in the domains of the index test and reference standard were significant moderators of sensitivity and specificity (P < .05). Compared with the BQ, STOP, and ESS, the SBQ is a more accurate tool for detecting mild, moderate, and severe OSA. Sleep specialists should use the SBQ to conduct patient interviews for the early diagnosis of OSA in clinical settings, particularly in resource-poor countries and sleep clinics where PSG is unavailable.

Keywords: Diagnostic meta-analysis; Instrument validation; Obstructive sleep apnea; Screening.

Publication types

  • Meta-Analysis
  • Review
  • Research Support, Non-U.S. Gov't

MeSH terms

  • Berlin
  • Humans
  • Sensitivity and Specificity*
  • Sleep Apnea, Obstructive / diagnosis*
  • Surveys and Questionnaires*