Oncoshare: lessons learned from building an integrated multi-institutional database for comparative effectiveness research

AMIA Annu Symp Proc. 2012;2012:970-8. Epub 2012 Nov 3.


Comparative effectiveness research (CER) using observational data requires informatics methods for the extraction, standardization, sharing, and integration of data derived from a variety of electronic sources. In the Oncoshare project, we have developed such methods as part of a collaborative multi-institutional CER study of patterns, predictors, and outcome of breast cancer care. In this paper, we present an evaluation of the approaches we undertook and the lessons we learned in building and validating the Oncoshare data resource. Specifically, we determined that 1) the state or regional cancer registry makes the most efficient starting point for determining inclusion of subjects; 2) the data dictionary should be based on existing registry standards, such as Surveillance, Epidemiology and End Results (SEER), when applicable; 3) the Social Security Administration Death Master File (SSA DMF), rather than clinical resources, provides standardized ascertainment of mortality outcomes; and 4) CER database development efforts, despite the immediate availability of electronic data, may take as long as two years to produce validated, reliable data for research. Through our efforts using these methods, Oncoshare integrates complex, longitudinal data from multiple electronic medical records and registries and provides a rich, validated resource for research on oncology care.

Publication types

  • Research Support, Non-U.S. Gov't

MeSH terms

  • Breast Neoplasms / therapy*
  • Comparative Effectiveness Research*
  • Databases as Topic*
  • Electronic Health Records*
  • Female
  • Humans
  • Medical Informatics
  • Medical Record Linkage / methods*
  • Medical Records Systems, Computerized
  • Registries*
  • Systems Integration
  • Therapeutic Human Experimentation