The use of the kappa statistic is commonly accepted as a measure for interobserver variability. However, in some situations, the interpretation of kappa should be handled with care. In this study 21 obstetricians were asked to segment and classify 13 cardiotocographic recordings for the major fetal heart rate (FHR) patterns acceleration, baseline FHR level, deceleration and undefined segments. In two cases the kappa statistic showed a poor group agreement. These low kappa values, however, were mainly due to the high proportion of baseline segments indicated by the referees. This finding will be exemplified by a discussion of one of the cases.