Background: As high-speed computers and sophisticated software packages for data linkage become increasingly available, investigators from nearly every arena are creating massive databases for epidemiologic and comparative effectiveness research (CER). Decisions made during database construction have a major impact on the accuracy and completeness of the data. Considering their potential use in informing health-care decisions, it is vital that we increase transparency of these data, including a thorough understanding of the record linkage strategy implemented and an evaluation of linked and unlinked records so that potential biases can be addressed.
Methods: Our target population included infants born to Florida-resident women from January 1, 1998 through December 31, 2009 with a valid birth certificate record. We used a stepwise deterministic record linkage strategy to link to any and all inpatient, ambulatory, and emergency department hospital visits from birth through December 31, 2010, and to identify deaths that occurred within the first year of life. Thus, each infant was followed up for at least 1 year after birth or until death, up to a maximum of 13 years. We investigated linkage rates and associations between linked status (linked vs unlinked) and a host of maternal and infant demographic and reproductive characteristics, all extracted from the birth certificate files. Bivariate county-level maps were created to describe the impact of both maternal race/ethnicity and maternal nativity on the geographic variation in linkage rates.
Results: During the 13-year study period, there were 2,549,738 birth certificate records for infants born alive to Florida resident women, and with no indication of an adoption. We were able to link 2,347,738 (92.1 percent) birth certificate records to an infant birth hospitalization record. The highest crude unlinked rates were seen among infants who died during their first year of life (35.9 percent), births in which the documented principal source of payment was "self-pay" (28.1 percent), and infants born to mothers with less than a ninth-grade education (26.0 percent), who were foreign-born (12.9 percent), and who self-identified as Hispanic (12.8 percent). After adjusting for other related and potentially confounding variables, several of these infant and maternal characteristics were associated with increased odds of failure to link infant birth records.
Conclusion: Using a stepwise deterministic linkage approach, we achieved a high linkage rate of several data sources, and produced a reliable, multipurpose database that can be used for observational, comparative effectiveness, and health services research in maternal and child health (MCH) populations. Our findings underscore the importance of evaluating routinely collected health data and increasing clarity regarding the strengths and limitations of linked electronic data sources. The resultant database will be of immense utility to researchers, health planners, and policy makers as well as other stakeholders interested in MCH outcome studies.