The Impact of Standardizing the Definition of Visits on the Consistency of Multi-Database Observational Health Research

BMC Med Res Methodol. 2015 Mar 8;15:13. doi: 10.1186/s12874-015-0001-6.


Background: Use of administrative claims from multiple sources for research purposes is challenged by the lack of consistency in the structure of the underlying data and definition of data across claims data providers. This paper evaluates the impact of applying a standardized revenue code-based logic for defining inpatient encounters across two different claims databases.

Methods: We selected members who had complete enrollment in 2012 from the Truven MarketScan Commercial Claims and Encounters (CCAE) and the Optum Clinformatics (Optum) databases. The overall prevalence of inpatient conditions in the raw data was compared to that in the common data model (CDM) with the standardized visit definition applied.

Results: In CCAE, 87.18% of claims from 2012 that were classified as part of inpatient visits in the raw data were also classified as part of inpatient visits after the data were standardized to CDM, and this overlap was consistent from 2006 to 2011. In contrast, Optum had 83.18% concordance in classification of 2012 claims from inpatient encounters before and after standardization, but the consistency varied over time. The re-classification of inpatient encounters substantially impacted the observed prevalence of medical conditions occurring in the inpatient setting and the consistency in prevalence estimates between the databases. On average, before standardization, each condition in Optum was 12% more prevalent than that same condition in CCAE; after standardization, the prevalence of conditions had a mean difference of only 1% between databases. Amongst 7,039 conditions reviewed, the difference in the prevalence of 67% of conditions in these two databases was reduced after standardization.

Conclusions: In an effort to improve consistency in research results across database one should review sources of database heterogeneity, such as the way data holders process raw claims data. Our study showed that applying the Observational Medical Outcomes Partnership (OMOP) CDM with a standardized approach for defining inpatient visits during the extract, transfer, and load process can decrease the heterogeneity observed in disease prevalence estimates across two different claims data sources.

MeSH terms

  • Databases, Factual / classification
  • Databases, Factual / standards
  • Databases, Factual / statistics & numerical data*
  • Electronic Health Records / classification
  • Electronic Health Records / standards
  • Electronic Health Records / statistics & numerical data*
  • Health Surveys / methods
  • Health Surveys / statistics & numerical data
  • Humans
  • Inpatients / statistics & numerical data
  • Insurance Claim Review / classification
  • Insurance Claim Review / standards
  • Insurance Claim Review / statistics & numerical data*
  • Office Visits / statistics & numerical data*
  • Reference Standards