The reliability of an established comorbidity index (the Index of Co-Existent Disease) was tested using retrospective data from the case notes of elderly patients who had undergone total hip replacement. Inter-rater reliability was examined twice, first with two raters (n = 39) and then with three (n = 49). Intra-rater reliability was assessed using one rater (n = 45). Reasons for any lack of reliability were explored. The inter-rater reliability of the ICED was moderate (kappa 0.5-0.6). While the Functional Severity index performed well (kappa 0.6-1.0), the Index of Disease Severity subindex was less reliable (kappa 0.4-0.5). Differences between raters had an impact on the observed association between comorbidity and serious post-operative complications. Intra-rater reliability was excellent (kappa 0.9). Several reasons why inter-rater reliability was only moderate were identified, mostly related to uncertainties in applying the ICED. The reliability of the ICED needs to be improved before it is used more widely with retrospective data. This might be achieved by further clarification of the instructions for its use.