Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
. 2012 Jun;4(2):149-55.
doi: 10.4055/cios.2012.4.2.149. Epub 2012 May 17.

Pitfalls and important issues in testing reliability using intraclass correlation coefficients in orthopaedic research

Affiliations
Free PMC article

Pitfalls and important issues in testing reliability using intraclass correlation coefficients in orthopaedic research

Kyoung Min Lee et al. Clin Orthop Surg. 2012 Jun.
Free PMC article

Abstract

Background: Intra-class correlation coefficients (ICCs) provide a statistical means of testing the reliability. However, their interpretation is not well documented in the orthopedic field. The purpose of this study was to investigate the use of ICCs in the orthopedic literature and to demonstrate pitfalls regarding their use.

Methods: First, orthopedic articles that used ICCs were retrieved from the Pubmed database, and journal demography, ICC models and concurrent statistics used were evaluated. Second, reliability test was performed on three common physical examinations in cerebral palsy, namely, the Thomas test, the Staheli test, and popliteal angle measurement. Thirty patients were assessed by three orthopedic surgeons to explore the statistical methods testing reliability. Third, the factors affecting the ICC values were examined by simulating the data sets based on the physical examination data where the ranges, slopes, and interobserver variability were modified.

Results: Of the 92 orthopedic articles identified, 58 articles (63%) did not clarify the ICC model used, and only 5 articles (5%) described all models, types, and measures. In reliability testing, although the popliteal angle showed a larger mean absolute difference than the Thomas test and the Staheli test, the ICC of popliteal angle was higher, which was believed to be contrary to the context of measurement. In addition, the ICC values were affected by the model, type, and measures used. In simulated data sets, the ICC showed higher values when the range of data sets were larger, the slopes of the data sets were parallel, and the interobserver variability was smaller.

Conclusions: Care should be taken when interpreting the absolute ICC values, i.e., a higher ICC does not necessarily mean less variability because the ICC values can also be affected by various factors. The authors recommend that researchers clarify ICC models used and ICC values are interpreted in the context of measurement.

Keywords: Intraclass correlation coefficient; Orthopaedic research; Reliability.

PubMed Disclaimer

Conflict of interest statement

No potential conflict of interest relevant to this article was reported.

Figures

Fig. 1
Fig. 1
Intra-class correlation coefficient (ICC) is defined by the presented formula using between-target mean square (BMS), within-target mean square (WMS), and number of observers (k). BMS represents true subject variability, and WMS represents measurement error.
Fig. 2
Fig. 2
The data sets were simulated to imitate physical examination situation. We intended to vary the ranges and variability of the data simultaneously. Intra-class correlation coefficient (ICC) is associated with between-group variation and within variation-group. The left lower panel (Data 7) was taken as the reference where means of observers were determined to be 45, 50, and 55 and we increased within-group variation horizontally and between-group variation vertically so that a total of nine data sets were generated based on a multivariate normal distribution. We increased within-group variation by inflating the diagonal term of a covariance matrix, which was shown in horizontal direction and consequently it resulted in increasing ranges. The off-diagonal terms were modified to affect between-group variation. While a slope of one observer is fixed, slopes of others were gradually increased compared to the reference slope and its trend was presented in vertical direction.
Fig. 3
Fig. 3
Of the 143 orthopaedic articles using intra-class correlation coefficient (ICC), review articles, articles in other languages than English, and the articles not registered on the Journal Citation Report (JCR) index were excluded. Finally 92 articles were included.

Similar articles

Cited by

References

    1. Kelly MB. A review of the observational data-collection and reliability procedures reported in The Journal of Applied Behavior Analysis. J Appl Behav Anal. 1977;10(1):97–101. - PMC - PubMed
    1. Landis JR, Koch GG. The measurement of observer agreement for categorical data. Biometrics. 1977;33(1):159–174. - PubMed
    1. Guyatt G, Rennie D. Users' guides to the medical literature: a manual for evidence-based clinical practice. Chicago: AMA Press; 2002.
    1. Hunt RJ. Percent agreement, Pearson's correlation, and kappa as measures of inter-examiner reliability. J Dent Res. 1986;65(2):128–130. - PubMed
    1. Shrout PE, Fleiss JL. Intraclass correlations: uses in assessing rater reliability. Psychol Bull. 1979;86(2):420–428. - PubMed