Skip to main page content
Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
. 2002;305-9.

Analysis of identifier performance using a deterministic linkage algorithm

Affiliations
Free PMC article

Analysis of identifier performance using a deterministic linkage algorithm

Shaun J Grannis et al. Proc AMIA Symp. 2002.
Free PMC article

Abstract

As part of developing a record linkage algorithm using de-identified patient data, we analyzed the performance of several demographic variables for making linkages between patient registry records from two hospital registries and the Social Security Death Master File. We analyzed samples from each registry totaling 6,000 record-pairs to establish a linkage gold-standard. Using Social Security Number as the exclusive linkage variable resulted in substantial linkage error rates of 4.7% and 9.2%. The best single variable combination for finding links was Social Security Number, phonetically compressed first name, birth month, and gender. This found 87% and 88% of the links without any false links. We achieved sensitivities of 90% to 92% while maintaining 100% specificity using combinations of social security number, gender, name, and birth date fields. This represents an accurate method for linking patient records to death data and is the basis for a more generalized de-identified linkage algorithm.

Similar articles

Cited by

References

    1. Stud Health Technol Inform. 2001;84(Pt 2):1384-8 - PubMed
    1. Int J Epidemiol. 1990 Sep;19(3):553-8 - PubMed
    1. Chronic Dis Can. 1999;20(2):77-81 - PubMed
    1. Med Care. 1993 Aug;31(8):732-48 - PubMed
    1. J Am Med Inform Assoc. 1997 May-Jun;4(3):233-7 - PubMed

Publication types

LinkOut - more resources