Objective: To compare record linkage (RL) procedures adopted in several Italian settings and a standard probabilistic RL procedure for matching data from electronic health care databases.
Design: Two health care archives are matched: the hospital discharges (HD) archive and the population registry of four Italian areas. Exact deterministic, stepwise deterministic techniques and a standard probabilistic RL procedure are applied to match HD for acute myocardial infarction (AMI) and diabetes mellitus. Sensitivity and specificity for RL procedures are estimated after manual review. Age and gender standardized annual hospitalization rates for AMI and diabetes are computed using different RL procedures and compared.
Setting: Municipalities of Pisa and Roma, and Regions of Puglia and Piemonte.
Participants: Residents in the considered areas on 31 December 2003 and corresponding episodes of hospitalization in the same areas during 2004.
Main outcome measures: Measures of accuracy of RL procedures to match health care administrative databases.
Results: Data quality varies among archives and affects the decision rule of the probabilistic procedure. A unique decision rule was therefore adopted by means of choosing a positive predictive value of at least 98% for all the considered areas. The number of matched pairs identified with the probabilistic procedure is on average more then 11% greater than the number identified with the deterministic procedure. Sensitivity of probabilistic RL is similar or greater than that of other procedures. Differences between annual standardized hospitalization rates computed with stepwise deterministic RL and the standard probabilistic RL procedure vary among areas.
Conclusion: Exact deterministic RL works well when unique identifiers and high quality data are available. The probabilistic procedure here proposed works as well as semi-deterministic RL when the latter implements a quality control of data or a manual review of final results. Otherwise, deterministic or semi-deterministic procedures imply classification errors of unknown size and direction.