Augmenting fact and date of death in electronic health records using internet media sources: a validation study from two large healthcare systems

Am J Epidemiol. 2026 Mar 17;195(4):1120-1128. doi: 10.1093/aje/kwaf258.

Abstract

This study evaluated death ascertainment from publicly available internet sources for patients in two large tertiary care US healthcare systems, Mass General Brigham (MGB) and Vanderbilt University Medical Center (VUMC), benchmarked against state and federal vital statistics data. Names, dates of birth, and dates of death were extracted from 8.1 million internet media records using previously developed natural language processing models. Internet records were matched to 78 848 deceased patients from MGB and VUMC on first name, last name, and date of birth. Dates of death were validated against state vital statistics databases or the National Death Index as reference standards. We calculated sensitivity and positive predicted values (PPV) of internet sources in identifying dates of death within 7 days of the reference standard. Exact matching of records between internet media and reference standards on first name, last name, and date of birth, resulted in 30 067 (38.8%) matches, which showed PPV for death identification (98.2%-MGB; 98.9%-VUMC) in internet media and increased sensitivity of death capture over EHR alone by 24% at MGB and 18% at VUMC. In conclusion, using internet sources to augment mortality data increased capture of death meaningfully over reliance on EHR records alone.

Keywords: internet sources; mortality rate; observational studies; sensitivity analysis; vital status.

Publication types

  • Validation Study

MeSH terms

  • Death Certificates*
  • Electronic Health Records* / statistics & numerical data
  • Female
  • Humans
  • Internet*
  • Male
  • Middle Aged
  • Natural Language Processing
  • United States