Purpose: The General Practice Research Database (GPRD) is a database of longitudinal patient records from general practices in the United Kingdom. It is an important data source for pharmacoepidemiology studies, but until now it has been tedious to calculate the daily dose and duration of exposure to drugs prescribed. This is because general practitioners routinely record dosage instructions as free text rather than in a structured way. The objective was to develop and assess the validity of an automated algorithm to derive the daily dose from text dosage instructions.
Methods: A computer program was developed to derive numerical information from unstructured text dosage instructions. It was tested on dosage texts from a random sample of one million prescription entries. A random sample of 1,000 of these converted texts were manually checked for their accuracy.
Results: Out of the sample of one million prescription entries, 74.5% had text containing the daily dose, 14.5% had text but did not include a quantitative daily dose statement and 11.0% had no text entered. Of the 1000 texts which were checked manually, 767 stated the daily dose. The program interpreted 758 (98.8%) of these correctly, produced errors in four cases and failed to extract the dose from five texts.
Conclusions: An automated algorithm has been developed which can accurately extract the daily dose from almost 99% of general practitioners' text dosage instructions. It increases the utility of GPRD and other prescription data sources by enabling researchers to estimate the duration of drug exposure more efficiently.