Skip to main page content
Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
Comparative Study
. 2015 Mar;19(1):191-5.
doi: 10.1007/s11325-014-0990-0. Epub 2014 May 7.

Process and Outcome for International Reliability in Sleep Scoring

Comparative Study

Process and Outcome for International Reliability in Sleep Scoring

Xiaozhe Zhang et al. Sleep Breath. .


Objectives: The aim was to evaluate the inter-rater reliability in scoring sleep stages in two sleep labs in Berlin Germany and Beijing China.

Methods: The subjects consist of polysomnography (PSGs) from 15 subjects in a German sleep laboratory, with 7 mild to moderate sleep apnea hypopnea syndrome (SAHS) patients and 8 healthy controls, and PSGs from 15 narcolepsy patients in a Chinese sleep laboratory. Five experienced technologists including two Chinese and three Germans without common training scored the PSGs following the 2007 AASM manual except the EEG signals included only two EEG leads (C3/A2 and C4/A1). Differences in inter-scorer agreement were analyzed based on epoch-by-epoch comparison by means of Cohen's κ, and quantitative sleep parameters by means of intra-class correlation coefficients.

Results: Inter-laboratory epoch-by-epoch agreement comparison between scorers from the two countries yielded a moderate agreement with a mean κ value of 0.57 for controls, 0.58 for SAHS, and 0.54 for narcolepsy. When compared with controls, the inter-scoring agreement is higher for wake and N3 stage scoring in SAHS and N1 and N3 scoring in narcolepsy (p < 0.05). The only sleep stage with lower scoring agreement in both SAHS (κ 0.69 vs. 0.79, p = 0.034) and narcolepsy (0.66 vs 0.79, p = 0.022) was stage REM. Inter-laboratory comparisons showed that the most common combinations of deviating scorings were N1 and N2, N2 and N3, and N1 and wake. A 6.5 % deviating scoring rate of wake and REM and a 13.4 % deviating scoring rate of N1 and REM indicated that inter-laboratory scoring in narcolepsy was about twice as in SAHS and controls confused. This was further confirmed by agreement analysis of quantitative parameters using intra-class correlation coefficients ICC(2,1) indicating REM sleep scoring agreement was lower in narcolepsy than in controls (p < 0.05).

Conclusion: Low REM stage scoring agreement exists for narcoleptics and SAHS, indicating the necessity to study sleep stage scoring agreement for a specific sleep disorder. Intensive training is needed for the scoring of sleep in international multiple center studies to improve the scoring agreement.

Similar articles

See all similar articles

Cited by 9 articles

See all "Cited by" articles


    1. Sleep. 2011 Jan 01;34(1):73-81 - PubMed
    1. J Sleep Res. 2004 Mar;13(1):63-9 - PubMed
    1. J Clin Sleep Med. 2013 Sep 15;9(9):955-65 - PubMed
    1. J Sleep Res. 2009 Mar;18(1):74-84 - PubMed
    1. J Clin Sleep Med. 2013 Jan 15;9(1):89-91 - PubMed

Publication types

LinkOut - more resources