Background: The United States is moving toward active drug safety surveillance using sources such as administrative claims and electronic medical records, but use of these data for studying teratogenicity has been challenging, as they typically do not allow for the easy identification of pregnancies. Our goal was to develop and validate an algorithm for the identification of pregnancies in the general practice research database (GPRD) that could be used to study pregnancy outcomes.
Methods: The algorithm identified pregnancies in women 15-45-year-old that were pregnant between 1 January 1987 and 31 December 2006. We identified live births, stillbirths, and spontaneous and elective terminations within a woman's record. We validated the algorithm using the additional clinical details maternity (ACDM) file and de-identified free-text records.
Results: We analyzed 16,035,394 records from 3,093,927 individuals and identified 383,184 women who had a total of 580,356 pregnancies. There were 415,221 full-term live births, 3080 pre- or post-term births, 1834 multi-fetus deliveries, 86,408 spontaneous abortions or miscarriages, 72 164 elective terminations, and 1649 stillbirths or fetal deaths. A marker of pregnancy care was identifiable for 86.3% of the 580,356 pregnancies. The internal validation steps indicated that the algorithm produced consistent results with the ACDM file.
Conclusions: We were successful in identifying a large number of pregnancies in the GPRD. Our use of a hierarchical approach to identify pregnancy outcomes builds upon the methods suggested in previous work, while implementing additional steps to minimize potential misclassification of pregnancy outcomes.