Context: Investigators applying generalisability theory to educational research and evaluation have sometimes done so poorly. The main difficulties have related to: inadequate or non-random sampling of effects, dealing with naturalistic data, and interpreting and presenting variance components.
Methods: This paper addresses these areas of difficulty, and articulates an informal consensus amongst medical educators from Europe, Australia and the USA, who are familiar with generalisability theory.
Results: We make the following recommendations. Ensure that all relevant factors are sampled, and that the sampling meets the theory's assumption that the conditions represent a random and representative sample of the factor's 'universe'. Research evaluations will require large samples of each factor if they are to generalise adequately. Where feasible, conduct 2 separate studies (pilot and evaluation, or Generalisability and Decision studies). For unbalanced data, use either urgenova, or 1 of the procedures minimum norm quadratic unbiased estimator, (minque), maximum likelihood (ml) or restricted maximum likelihood (reml) in spss or sas if the data are too complex. State which mathematical procedure was used and the degrees of freedom (d.f.) of the effect estimates. If the procedure does not report d.f., re-analyse with type III sum of squares anova (anova ss III) and report these d.f. Describe and justify the regression model used. Present the raw variance components. Describe the effects that they represent in plain, non-statistical language. If standard error of measurement (SEM) or Reliability coefficients are presented, give the equations used to calculate them. Make sure that the method of reporting reliability (precision or discrimination) is appropriate to the purpose of the assessment. This will usually demand a precision indicator such as SEM. Consider a graphical presentation to combine precision and discrimination.