Skip to main page content
Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
. 2018 Sep 12;18(1):93.
doi: 10.1186/s12874-018-0550-6.

Estimation of an Inter-Rater Intra-Class Correlation Coefficient That Overcomes Common Assumption Violations in the Assessment of Health Measurement Scales

Affiliations
Free PMC article

Estimation of an Inter-Rater Intra-Class Correlation Coefficient That Overcomes Common Assumption Violations in the Assessment of Health Measurement Scales

Carly A Bobak et al. BMC Med Res Methodol. .
Free PMC article

Abstract

Background: Intraclass correlation coefficients (ICC) are recommended for the assessment of the reliability of measurement scales. However, the ICC is subject to a variety of statistical assumptions such as normality and stable variance, which are rarely considered in health applications.

Methods: A Bayesian approach using hierarchical regression and variance-function modeling is proposed to estimate the ICC with emphasis on accounting for heterogeneous variances across a measurement scale. As an application, we review the implementation of using an ICC to evaluate the reliability of Observer OPTION5, an instrument which used trained raters to evaluate the level of Shared Decision Making between clinicians and patients. The study used two raters to evaluate recordings of 311 clinical encounters across three studies to evaluate the impact of using a Personal Decision Aid over usual care. We particularly focus on deriving an estimate for the ICC when multiple studies are being considered as part of the data.

Results: The results demonstrate that ICC varies substantially across studies and patient-physician encounters within studies. Using the new framework we developed, the study-specific ICCs were estimated to be 0.821, 0.295, and 0.644. If the within- and between-encounter variances were assumed to be the same across studies, the estimated within-study ICC was 0.609. If heteroscedasticity is not properly adjusted for, the within-study ICC estimate was inflated to be as high as 0.640. Finally, if the data were pooled across studies without accounting for the variability between studies then ICC estimates were further inflated by approximately 0.02 while formerly allowing for between study variation in the ICC inflated its estimated value by approximately 0.066 to 0.072 depending on the model.

Conclusion: We demonstrated that misuse of the ICC statistics under common assumption violations leads to misleading and likely inflated estimates of interrater reliability. A statistical analysis that overcomes these violations by expanding the standard statistical model to account for them leads to estimates that are a better reflection of a measurement scale's reliability while maintaining ease of interpretation. Bayesian methods are particularly well suited to estimating the expanded statistical model.

Keywords: Bayesian analysis; Hierarchical regression; ICC; Observer OPTION5; Reliability; Shared decision making; Variance function modelling.

Conflict of interest statement

Ethics approval and consent to participate

N/A

Consent for publication

N/A

Competing interests

The authors declare that they have no competing interests.

Publisher’s Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Figures

Fig. 1
Fig. 1
Comparison of Observer OPTION5 scores between raters. The individual rater score is shown on the y-axis and the mean OPTION5 score is shown on the x-axis
Fig. 2
Fig. 2
Actual difference of Observer OPTION5 between raters over the mean OPTION5 score. While the average difference is slightly less than 10, this difference varies greatly across the mean score, demonstrating non-constant variance
Fig. 3
Fig. 3
Empirical variance of scores Compares the mean variance, binomial variance, and the observed variance (using a smoothing spline with 10 degrees of freedom) of Observer OPTION5 score. Highlights the heteroscedasticity of the variance as a function of the mean
Fig. 4
Fig. 4
A comparison of the posterior distribution of the key parameters underlying the ICC between the within-encounter variance and the between-encounter (but within study) variance across the three studies
Fig. 5
Fig. 5
Posterior distributions of the ICCs for each study, and the difference in the ICC for each pair of studies
Fig. 6
Fig. 6
Direct analysis of ICC as a function of level of agreement Relationship of ICC to the true amount of shared decision making (SDM) in an encounter and heterogeneity of reliability of measurements across studies. The ICC is higher at the ends of the scale than at the center where the variability under the binomial variance function of rater scores on the same encounter is greatest and the difference in the reliability of measurements across the studies is substantial

Similar articles

See all similar articles

Cited by 1 article

References

    1. Fisher RA. On the “probable error” of a coefficient of correlation deduced from a small sample. Metron. 1921; 1:1–32.
    1. Ebel RL. Estimation of the reliability of ratings. Psychometrika. 1951;16:407–24. doi: 10.1007/BF02288803. - DOI
    1. Strah KM, Love SM. The in situ carcinomas of the breast. J Am Med Women’s Assoc (1972) 1992;47:165–8. - PubMed
    1. Visscher PM, Medland SE, Ferreira MAR, Morley KI, Zhu G, Cornes BK, Montgomery GW, Martin NG. Assumption-free estimation of heritability from genome-wide identity-by-descent sharing between full siblings. PLoS Genet. 2006;2:41. doi: 10.1371/journal.pgen.0020041. - DOI - PMC - PubMed
    1. Bradley RA, Schumann DEW. The comparison of the sensitivities of similar experiments: Applications. Int Biom Soc. 1957;13:496.

Publication types

MeSH terms

LinkOut - more resources

Feedback