Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
Meta-Analysis
. 2022 Jan 1;18(1):193-202.
doi: 10.5664/jcsm.9538.

Interrater reliability of sleep stage scoring: a meta-analysis

Affiliations
Meta-Analysis

Interrater reliability of sleep stage scoring: a meta-analysis

Yun Ji Lee et al. J Clin Sleep Med. .

Abstract

Study objectives: We evaluated the interrater reliabilities of manual polysomnography sleep stage scoring. We included all studies that employed Rechtschaffen and Kales rules or American Academy of Sleep Medicine standards. We sought the overall degree of agreement and those for each stage.

Methods: The keywords were "Polysomnography (PSG)," "sleep staging," "Rechtschaffen and Kales (R&K)," "American Academy of Sleep Medicine (AASM)," "interrater (interscorer) reliability," and "Cohen's kappa." We searched PubMed, OVID Medline, EMBASE, the Cochrane library, KoreaMed, KISS, and the MedRIC. The exclusion criteria included automatic scoring and pediatric patients. We collected data on scorer histories, scoring rules, numbers of epochs scored, and the underlying diseases of the patients.

Results: A total of 101 publications were retrieved; 11 satisfied the selection criteria. The Cohen's kappa for manual, overall sleep scoring was 0.76, indicating substantial agreement (95% confidence interval, 0.71-0.81; P < .001). By sleep stage, the figures were 0.70, 0.24, 0.57, 0.57, and 0.69 for the W, N1, N2, N3, and R stages, respectively. The interrater reliabilities for stage N2 and N3 sleep were moderate, and that for stage N1 sleep was only fair.

Conclusions: We conducted a meta-analysis to generalize the variation in manual scoring of polysomnography and provide reference data for automatic sleep stage scoring systems. The reliability of manual scorers of polysomnography sleep stages was substantial. However, for certain stages, the results were poor; validity requires improvement.

Citation: Lee YJ, Lee JY, Cho JH, Choi JH. Interrater reliability of sleep stage scoring: a meta-analysis. J Clin Sleep Med. 2022;18(1):193-202.

Keywords: interrater reliability; meta-analysis; sleep stage scoring.

PubMed Disclaimer

Conflict of interest statement

All authors have seen and approved the manuscript. Work for this study was performed in the Department of Otorhinolaryngology—Head and Neck Surgery, College of Medicine, Soonchunhyang University, Bucheon Hospital, Bucheon, Korea. This study was funded by the Soonchunhyang University Research Fund. The authors report no conflicts of interest.

Figures

Figure 1
Figure 1. Data matrix and formula for calculating the Cohen’s κ.
(A) The data matrix derived when sleep scoring sought to identify 5 sleep stage categories (W, N1, N2, N3, and R). Sij is the number of epochs. (B) The formula used to calculate the κ coefficient. (a) N is the total number of epochs scored. (b) Po is the observed agreement and Pc is the expected agreement. (c, d) Po and Pc are derived using these formulas.
Figure 2
Figure 2. Flow diagram of study selection.
Figure 3
Figure 3. Forest plot for overall interrater reliability.
CI = confidence interval.
Figure 4
Figure 4. Forest plot for interrater reliability of different sleep stages.
CI = confidence interval.
Figure 5
Figure 5. Funnel plot for overall interrater reliability.
Figure 6
Figure 6. Forest plot and funnel plot for interrater reliabilities by stage.
CI = confidence interval, REM = rapid eye movement.
Figure 7
Figure 7. Data matrix and formula for calculating the ICC.
(A) When the scorers (j = 1, 2, …, k) evaluate the PSG results of the patients (i = 1, 2, …, n), the data matrix can be filled in with target variables xij. Values of the target variables should fall along a continuous scale, such as the AHI and sleep stage (% or minutes). (B) The basic formula used to calculate the ICC in a 2-way random model. (a) Each measurement xij is assumed to be composed of a true component and a measurement error component. The model can be regarded as the sum of 5 terms: μ = mean of the patient’s scores, ri = deviation from the mean for patient i, cj = bias of scorer j, rcij = interaction between patient deviation and scorer deviation, and eij= measurement error. (b) The ICC was calculated as a ratio of variance based on the results of an analysis of variance. The total variance is equal to the sum of the variance of interest (true score variance) and the error variance. The ICC is unitless and has a value between 0 and 1; an estimate of 1 indicates perfect reliability and 0 indicates no reliability. AHI = apnea-hypopnea index, ICC = intraclass correlation coefficient, PSG = polysomnography.

Similar articles

Cited by

References

    1. Javaheri S, Redline S. Sleep, slow-wave sleep, and blood pressure. Curr Hypertens Rep. 2012; 14( 5): 442– 448. - PubMed
    1. Pillai JA, Leverenz JB. Sleep and neurodegeneration: a critical appraisal. Chest. 2017; 151( 6): 1375– 1386. - PubMed
    1. Kales A, Rechtschaffen A. A Manual of Standardized Terminology, Techniques and Scoring System for Sleep Stages in Human Subjects. Washington, DC: U.S. Government Printing Office; 1968. - PubMed
    1. Iber C, Ancoli-Israel S, Chesson AL Jr, Quan SF; for the American Academy of Sleep Medicine. The AASM Manual for the Scoring of Sleep and Associated Events: Rules, Terminology and Technical Specifications. 1st ed. Westchester, IL: American Academy of Sleep Medicine; 2007.
    1. Norman RG, Pal I, Stewart C, Walsleben JA, Rapoport DM. Interobserver agreement among sleep scorers from different centers in a large dataset. Sleep. 2000; 23( 7): 901– 908. - PubMed

Publication types