Harmonizing Medicare Claims Data with OMOP: A Validated ETL Pipeline

AMIA Annu Symp Proc. 2025 May 22:2024:715-723. eCollection 2024.

Abstract

This study presents a Python-based Extract, Transform, and Load (ETL) pipeline that converts Medicare Limited Data Set (LDS) claims into the Observational Medical Outcomes Partnership (OMOP) Common Data Model (CDM). By mapping Medicare LDS tables to fifteen OMOP CDM tables, we achieved minimal data loss. Rigorous validation using the OMOP Data Quality Dashboard indicated a 99% pass rate across over 1,500 checks, affirming data fidelity. A comparative analysis showed high concordance in demographic traits and clinical conditions between the original and transformed datasets. Despite structural constraints and minor syntax errors leading to some unmapped codes, our approach preserves key administrative details and standardizes healthcare data for large-scale observational research. This scalable, reproducible pipeline addresses critical gaps in Medicare-LDS-to-OMOP conversion, improving data integration for diverse applications in health services research, population health, and policy analysis. Future expansions will incorporate additional clinical details and advanced concept mappings.

Publication types

  • Validation Study

MeSH terms

  • Humans
  • Insurance Claim Review*
  • Medicare*
  • Outcome Assessment, Health Care*
  • United States