We assessed the reliability of epidemiologic data extracted by three data technicians from the medical records of 102 patients in a case-control study. The collected data, which were extracted on two separate occasions, included such clinical and pharmaceutical features as history of lactation, hysterectomy, diabetes, and hypertension, type of menopause, and whether a women had ever used exogenous estrogens. Although we found high rates of intra- and interextractor agreement, some errors in excerpting and classifying data did occur, and were especially common in making distinctions between uncertain data (due to ambiguous or incomplete descriptions) and negative responses (indicating that a feature was truly absent). We were able to discern six distinctive sources of this variability, four in excerpting the data from the hospital record, and two in coding the excerpted data. As a result of these findings, strategies are proposed for improving the basic quality of data used in epidemiologic research.