Inter-observer reliability of two pain scales for newborns

Early Hum Dev. 2007 Aug;83(8):549-52. doi: 10.1016/j.earlhumdev.2006.10.006. Epub 2006 Dec 11.


Aim: To assess inter-observer reliability of two of the most widely used pain scales for newborns.

Background: More than 30 scales exist to assess neonatal pain, but they are rarely used because they are too complicated or unreliable.

Method: We scored pain level in two groups of babies during a heelprick. The first group of 20 premature babies (mean gestational age: 34.2+/-1.2 weeks) was studied using the PIPP scale, and the second group of 20 term babies (mean gestational age: 39.5+/-0.9 weeks) with the NIPS scale. We compared the pain scores assigned by the nurse who took the blood sample (nurse A) and those assigned by another who was present during heelprick (nurse B) with those assigned by a nurse who later watched the video clip of the procedure (nurse C). We chose the latter as "objective" score, because in this case the scorer could watch the recorded event several times, timing and scoring it thoroughly.

Finding: NIPS: 8/20 scores were different between nurse A and nurse C, but only in one case was this difference greater than 2 (Cohen's K=0.60). In the case of nurse B, there were 12/20 differences with respect to the score assigned by nurse C but only one baby was assigned a score that differed by more than 2 (Cohen's K=0.30). PIPP: 16/20 scores were different between nurse A and nurse C; in 9 cases this difference was more than 2 (Cohen's K=0.10). In the case of nurse B, differences with respect to the score assigned by nurse C occurred in 17/20 cases and for six babies the difference in score was more than 2 (Cohen's K=0.16).

Conclusion: Our results indicate a higher inter-observer reliability of NIPS than PIPP, though NIPS did not have a very high inter-observer agreement score. Caregivers who use them to assess pain in real time at the cribside should be aware of the limits we have highlighted in this study.

