The Drug Data to Knowledge Pipeline: Large-Scale Claims Data Classification for Pharmacologic Insight

AMIA Jt Summits Transl Sci Proc. 2016 Jul 20;2016:105-11. eCollection 2016.


In biomedical informatics, assigning drug codes to categories is a common step in the analysis pipeline. Unfortunately, incomplete mappings are the norm rather than the exception with coverage values less than 85% not uncommon. Here, we perform this linking task on a nationwide insurance claims database with over 13 million members who were dispensed, according to National Drug Codes (NDCs), over 50,000 unique product forms of medication. The chosen approach employs Cerner Multum's VantageRx and the U.S. National Library of Medicine's RxMix. As a result, 94.0% of the NDCs were successfully mapped to categories used by common drug terminologies, e.g., Anatomical Therapeutic Chemical (ATC). Implemented as an SQL database and scripts, the approach is generic and can be setup for a new data set in a few hours. Thus, the method is a viable option for large-scale drug classification.