Purpose: This work investigated the reliability of and relationships between individual case and composite scores on a standardized patient clinical skills examination.
Method: Four hundred ninety two fourth-year U.S. medical students received three scores [data gathering (DG), interpersonal skills (IPS), and written communication (WC)] for each of 10 standardized patient cases. mGENOVA software was used for all analyses.
Results: Estimated generalizability coefficients were 0.69, 0.80, and 0.70 for the DG, IPS, and WC scores, respectively. The universe-score correlation between DG and WC was high (.83); those for DG/IPS and IPS/WC were not as strong (0.51 and 0.37, respectively). Task difficulty appears to be modestly but positively related across the three scores. Correlations between the person-by-task effects for DG/IPS and DG/WC were positive yet modest. The estimated generalizability coefficient for a ten-case test using an equally weighted composite DG/WC score was 0.78.
Conclusions: This work allows for interpretation of correlations between (1) proficiencies measured by multiple scores and (2) sources of error that affect those scores as well as for estimation of the reliability of composite scores. Results have important implications for test construction and test validity.