Sixteen indices of interobserver agreement and six methods for estimating coefficients of interobserver reliability were critiqued. The agreement statistics were found to be imprecise, limited psychometrically, and relatively inflexible in terms of the diverse categorical and quantitative data sets typically encountered in mental retardation research. Five of the reliability statistics produced precise estimates of agreement, yet possessed similar limitations. Only the intraclass correlation--generalizability theory approach seemed to offer the precision, comprehensiveness, and flexibility required to deal with the complexity of reliability assessment. A basic generalizability model was described and illustrated with group and single-subject research data.