This paper reviews the most frequently used and misused reliability measures appearing in the mental health literature. We illustrate the various types of data sets on which reliability is assessed (i.e., two raters, more than two raters, and varying numbers of raters with dichotomous, polychotomous, and quantitative data). Reliability statistics appropriate for each data format are presented, and their pros and cons illustrated. Inadequancies of some methods are highlighted. The meaning of different levels of reliability obtained with various statistics is discussed. This critique is intended for the reading professional and the investigator who has an occasional need for reliability assessment. Statistical expertise is not required and theoretical material is referenced for the interested reader. Necessary formulas for computations are presented in the appendices. A summary table of some suitable reliability measures is presented.