Federated queries of clinical data repositories: the sum of the parts does not equal the whole

J Am Med Inform Assoc. 2013 Jun;20(e1):e155-61. doi: 10.1136/amiajnl-2012-001299. Epub 2013 Jan 24.


Background and objective: In 2008 we developed a shared health research information network (SHRINE), which for the first time enabled research queries across the full patient populations of four Boston hospitals. It uses a federated architecture, where each hospital returns only the aggregate count of the number of patients who match a query. This allows hospitals to retain control over their local databases and comply with federal and state privacy laws. However, because patients may receive care from multiple hospitals, the result of a federated query might differ from what the result would be if the query were run against a single central repository. This paper describes the situations when this happens and presents a technique for correcting these errors.

Methods: We use a one-time process of identifying which patients have data in multiple repositories by comparing one-way hash values of patient demographics. This enables us to partition the local databases such that all patients within a given partition have data at the same subset of hospitals. Federated queries are then run separately on each partition independently, and the combined results are presented to the user.

Results: Using theoretical bounds and simulated hospital networks, we demonstrate that once the partitions are made, SHRINE can produce more precise estimates of the number of patients matching a query.

Conclusions: Uncertainty in the overlap of patient populations across hospitals limits the effectiveness of SHRINE and other federated query tools. Our technique reduces this uncertainty while retaining an aggregate federated architecture.

Keywords: Algorithms; Hospital Shared Services; Medical Record Linkage; Medical Records Systems, Computerized; Search Engine.

Publication types

  • Research Support, N.I.H., Extramural
  • Research Support, Non-U.S. Gov't

MeSH terms

  • Boston
  • Computer Communication Networks*
  • Databases, Factual
  • Hospital Administration
  • Humans
  • Information Storage and Retrieval*
  • Medical Record Linkage
  • Medical Records Systems, Computerized*