An inter-rater variability study between human and automatic scorers in 5-s mini-epochs of sleep

Sleep Med. 2025 Apr:128:139-150. doi: 10.1016/j.sleep.2025.02.005. Epub 2025 Feb 6.

Abstract

Study objective: Sleep is traditionally scored using 30-s epochs of polysomnographies. As sleep is physiologically dynamic and 30-s epochs may conceal important characteristics, we aim to challenge this standard by scoring sleep in 5-s mini-epochs and analyzing inter-rater variability between human and automatic scorers.

Methods: In 40 polysomnography recordings, 120 mini-epochs per polysomnography were scored manually by three human experts (expert1_5s, expert2_5s and expert3_5s) and automatically by a validated sleep classifier (USleep_5s). Additionally, 5-s mini-epochs (clinical_5s) extracted from conventional human-scored 30-s epochs were considered. We assessed inter-rater variability and stage shifting in epochs and mini-epochs and further in narcolepsy type 1 (NT1) patients and siblings.

Results: Agreement for mini-epochs was κ = 0.50 ± 0.11 (expert1_5s vs clinical_5s) and κ = 0.51 ± 0.12, (expert1_5s vs USleep_5s). Between human experts, agreement was κ = 0.51 ± 0.16 (expert1_5s vs expert2_5s), and κ = 0.57 ± 0.11 (expert1_5s vs expert3_5s). Stage shift percentages were significantly higher in mini-epochs scored by expert1_5s (27.75 %) and USleep_5s (22.88 %) than corresponding conventional epochs (5.12 %), with no significant difference between NT1 patients and siblings.

Conclusion: While mini-epoch scoring agreement was generally high, it was still lower than within epochs, likely due to a lack of standard mini-epoch scoring procedure and the automatic classifier being trained on epochs. However, stage discrepancies between epochs and mini-epochs and increased stage shifting in mini-epochs support that epochs can contain several stages, and that mini-epochs could supplement more detailed sleep characterization potentially enabling more precise diagnosis and finding new polysomnographic biomarkers. Future studies should include larger datasets to refine mini-epoch scoring rules and exploit automatic classifiers e.g. via transfer learning.

Keywords: Automatic sleep classification; Computerized analysis; Inter-rater variability; Mini-epochs; Polysomnography; Sleep stage scoring.

Publication types

  • Research Support, Non-U.S. Gov't

MeSH terms

  • Adult
  • Female
  • Humans
  • Male
  • Middle Aged
  • Narcolepsy* / diagnosis
  • Narcolepsy* / physiopathology
  • Observer Variation*
  • Polysomnography* / methods
  • Reproducibility of Results
  • Sleep / physiology
  • Sleep Stages / physiology

Supplementary concepts

  • Narcolepsy 1