Objective: To assess inter-rater reliability between different scorers, from different qualified sleep research groups, in scoring visually the Cyclic Alternating Pattern (CAP), to evaluate the performances of a new tool for the computer-assisted detection of CAP, and to compare its output with the data from the different scorers.
Methods: CAP was scored in 11 normal sleep recordings by four different raters, coming from three sleep laboratories. CAP was also scored in the same recordings by means of a new computer-assisted method, implemented in the Hypnolab 1.2 (SWS Soft, Italy) software. Data analysis was performed according to the following steps: (a) the inter-rater reliability of CAP parameters between the four different scorers was carried out by means of the Kendall W coefficient of concordance; (b) the analysis of the agreement between the results of the visual and computer-assisted analysis of CAP parameters was also carried out by means of the Kendall W coefficient; (c) a 'consensus' scoring was obtained, for each recording, from the four scorings provided by the different raters, based on the score of the majority of scorers; (d) the degree of agreement between each scorer and the consensus score and between the computer-assisted analysis and the consensus score was quantified by means of the Cohen's k coefficient; (e) the differences between the number of false positive and false negative detections obtained in the visual and in the computer-assisted analysis were also evaluated by means of the non-parametric Wilcoxon test.
Results: The inter-rater reliability of CAP parameters quantified by the Kendall W coefficient of concordance between the four different scorers was high for all the parameters considered and showed values above 0.9 for total CAP time, CAP time in sleep stage 2 and percentage of A phases in sequence; also CAP rate showed a high value (0.829). The most important global parameters of CAP, including total CAP rate and CAP time, scored by the computer-assisted analysis showed a significant concordance with those obtained by the raters. The agreement between the computer-assisted analysis and the consensus scoring for the assignment of the CAP A phase subtype was not distinguishable from that expected from a human scorer. However, the computer-assisted analysis provided a number of false positives and false negatives significantly higher than that of the visual scoring of CAP.
Conclusions: CAP scoring shows good inter-rater reliability and might be compared in different laboratories the results of which might also be pooled together; however, caution should always be taken because of the variability which can be expected in the classical sleep staging. The computer-assisted detection of CAP can be used with some supervision and correction in large studies when only general parameters such as CAP rate are considered; more editing is necessary for the correct use of the other results.
Significance: This article describes the first attempt in the literature to evaluate in a detailed way the inter-rater reliability in scoring CAP parameters of normal sleep and the performances of a human-supervised computerized automatic detection system.