Evaluators of family planning programs have begun to use simulated client ratings to assess the quality of services. However, little is known about the reliability of such ratings when they are used to assess individual provider performance. This study examined the reliability of quality-of-care ratings in a Peruvian community-based distribution program by using pairs of concealed observers--a simulated client and a companion. Average interrater agreement, measured by intraclass correlation, was .50, indicating that ratings are not reliable enough for the evaluation of a single provider by a single rater. The study results suggest that checklist-item scores referring to specific provider behaviors will be more reliable and useful than ratings.