Effects of interrater reliability of psychopathologic assessment on power and sample size calculations in clinical trials

J Clin Psychopharmacol. 2002 Jun;22(3):318-25. doi: 10.1097/00004714-200206000-00013.


Although rater training is increasingly used to improve the quality of the investigated outcome parameters, the reliability of assessments is not perfect. Thus, empirical reliability estimates should be used instead of theoretically assumed perfect reliability. Implications of the reliability of psychiatric assessments for sample size and power calculations in clinical trials are presented. The theoretical basis of sample size and power calculations using empirical reliability scores is delineated. Examples from contemporary research on schizophrenia and depression are used to illustrate several implications for study design and interpretation of results. The tremendous impact of the lack of reliability of psychopathologic assessments on sample size, power, and detectable true score differences in clinical trials is shown. The problem of multiple outcome variables with different reliabilities is addressed. Studies lacking power because of unreliable assessments carry the risk of false-negative findings and raise ethical questions. Rater training is strongly recommended to assess and improve interrater reliability whenever necessary and possible before trials are started. Sample size calculations and power analysis should be based on empirical reliability values of outcome parameters as part of quality assurance and cost savings.

MeSH terms

  • Clinical Trials as Topic / methods
  • Clinical Trials as Topic / statistics & numerical data*
  • Humans
  • Observer Variation*
  • Psychopathology
  • Sample Size