Background: The Waterlow scale is one of the pressure ulcer risk assessment scales which are frequently criticised for their low reliability. It is widely used in the United Kingdom, Europe and all over the world.
Objectives: The study objectives were to systematically review and evaluate inter- and intrarater reliability and/or agreement of the whole Waterlow scale and its single items. The overall aim was to find out if the Waterlow scale is applicable to daily clinical practice.
Design: Systematic review.
Data sources: MEDLINE (1985-June 2008), EMBASE (1985-June 2008), CINAHL (1985-June 2008) and World Wide Web.
Review methods: Selections of relevant studies, data extractions, recalculations of reliability and agreement coefficients, and study quality assessments were independently conducted by two researchers. Designs, methods and results of relevant studies were systematically described, compared and interpreted.
Results: Eight research reports were identified containing the results of nine inter- and intrarater reliability and agreement studies. Only three studies were considered as high quality studies. The Waterlow scale in clinical practice was examined in four studies. Interrater agreement for the total score varied between 0% and 57%. Taking into account any differences of up to two points the total score agreement increased to up to 86%. Median ranges of differences among raters scoring single items were high for 'poor nutrition', 'skin type', and 'mobility'. Recalculated intrarater reliability for one researcher was ICC(2,1)=0.97 (95% C.I. 0.94-0.98).
Conclusions: Empirical evidence is rare regarding reliability and agreement among nurses when using the Waterlow scale in clinical practice. Interrater agreement for the total score is comparable to other pressure ulcer risk assessment scales. The interrater reliability has never been examined. Therefore, evaluation of reliability and agreement and evaluation of the applicability of the Waterlow scale to clinical practice are limited. It is very likely that the items 'poor nutrition', 'mobility', and 'skin type' are the most difficult items to rate.