Reliability of the American Academy of Sleep Medicine Rules for Assessing Sleep Depth in Clinical Practice

J Clin Sleep Med. 2018 Feb 15;14(2):205-213. doi: 10.5664/jcsm.6934.


Study objectives: The American Academy of Sleep Medicine has published manuals for scoring polysomnograms that recommend time spent in non-rapid eye movement sleep stages (stage N1, N2, and N3 sleep) be reported. Given the well-established large interrater variability in scoring stage N1 and N3 sleep, we determined the range of time in stage N1 and N3 sleep scored by a large number of technologists when compared to reasonably estimated true values.

Methods: Polysomnograms of 70 females were scored by 10 highly trained sleep technologists, two each from five different academic sleep laboratories. Range and confidence interval (CI = difference between the 5th and 95th percentiles) of the 10 times spent in stage N1 and N3 sleep assigned in each polysomnogram were determined. Average values of times spent in stage N1 and N3 sleep generated by the 10 technologists in each polysomnogram were considered representative of the true values for the individual polysomnogram. Accuracy of different technologists in estimating delta wave duration was determined by comparing their scores to digitally determined durations.

Results: The CI range of the ten N1 scores was 4 to 39 percent of total sleep time (% TST) in different polysomnograms (mean CI ± standard deviation = 11.1 ± 7.1 % TST). Corresponding range for N3 was 1 to 28 % TST (14.4 ± 6.1 % TST). For stage N1 and N3 sleep, very low or very high values were reported for virtually all polysomnograms by different technologists. Technologists varied widely in their assignment of stage N3 sleep, scoring that stage when the digitally determined time of delta waves ranged from 3 to 17 seconds.

Conclusions: Manual scoring of non-rapid eye movement sleep stages is highly unreliable among highly trained, experienced technologists. Measures of sleep continuity and depth that are reliable and clinically relevant should be a focus of clinical research.

Keywords: digital sleep analysis; interrater variability; sleep depth.

MeSH terms

  • Female
  • Humans
  • Middle Aged
  • Observer Variation
  • Polysomnography / standards*
  • Reproducibility of Results
  • Sleep Medicine Specialty / standards*
  • Sleep Stages*
  • Sleep, Slow-Wave
  • Societies, Medical
  • United States