In medical records review studies, information on the interrater reliability (IRR) of the data is seldom reported. This study assesses the IRR of data collected for a complex medical records review study. Elements selected for determining IRR included "demographic" data that require copying explicit information (e.g., gender, birth date), "free-text" data that require identifying and copying (e.g., chief complaints and diagnoses), and data that require abstractor judgment in determining what to record (e.g., whether heart disease was considered). Rates of agreement were assessed by the greatest number of answers (one to all n) that were the same. The IRR scores improved over time. At 1 month, the reliability for demographic data elements was very good, for free-text data elements was good, but for data elements requiring abstractor judgment was unacceptable (only 3.4 of six answers agreed, on average). All assessments after 6 months showed very good to excellent IRR. This study demonstrates that IRR can be evaluated and summarized, providing important information to the study investigators and to the consumer for assessing the reliability of the data and therefore the validity of the study results and conclusions. IRR information should be required for all large medical records studies.