An argument for reporting data standardization procedures in multi-site predictive modeling: case study on the impact of LOINC standardization on model performance

Amie J Barda; Victor M Ruiz; Tony Gigliotti; Fuchiang Rich Tsui

doi:10.1093/jamiaopen/ooy063

An argument for reporting data standardization procedures in multi-site predictive modeling: case study on the impact of LOINC standardization on model performance

JAMIA Open. 2019 Apr;2(1):197-204. doi: 10.1093/jamiaopen/ooy063. Epub 2019 Feb 4.

Authors

Amie J Barda^{1

2}, Victor M Ruiz^{1

2}, Tony Gigliotti³, Fuchiang Rich Tsui^{1

2

4

5

6

7

8}

Affiliations

¹ Tsui Laboratory, Children's Hospital of Philadelphia, Philadelphia, Pennsylvania, USA.
² Department of Biomedical Informatics, School of Medicine, University of Pittsburgh, Pittsburgh, Pennsylvania, USA.
³ Information Services Division, University of Pittsburgh Medical Center, Pittsburgh, Pennsylvania, USA.
⁴ Department of Anesthesiology and Critical Care Medicine, Children's Hospital of Philadelphia, Philadelphia, Pennsylvania, USA.
⁵ Department of Biomedical and Health Informatics, Children's Hospital of Philadelphia, Philadelphia, Pennsylvania, USA.
⁶ Institute for Biomedical Informatics, University of Pennsylvania, Philadelphia, Pennsylvania, USA.
⁷ School of Computing Information, University of Pittsburgh, Pittsburgh, Pennsylvania, USA.
⁸ Department of Bioengineering, University of Pittsburgh, Pittsburgh, Pennsylvania, USA.

Abstract

Objectives: We aimed to gain a better understanding of how standardization of laboratory data can impact predictive model performance in multi-site datasets. We hypothesized that standardizing local laboratory codes to logical observation identifiers names and codes (LOINC) would produce predictive models that significantly outperform those learned utilizing local laboratory codes.

Materials and methods: We predicted 30-day hospital readmission for a set of heart failure-specific visits to 13 hospitals from 2008 to 2012. Laboratory test results were extracted and then manually cleaned and mapped to LOINC. We extracted features to summarize laboratory data for each patient and used a training dataset (2008-2011) to learn models using a variety of feature selection techniques and classifiers. We evaluated our hypothesis by comparing model performance on an independent test dataset (2012).

Results: Models that utilized LOINC performed significantly better than models that utilized local laboratory test codes, regardless of the feature selection technique and classifier approach used.

Discussion and conclusion: We quantitatively demonstrated the positive impact of standardizing multi-site laboratory data to LOINC prior to use in predictive models. We used our findings to argue for the need for detailed reporting of data standardization procedures in predictive modeling, especially in studies leveraging multi-site datasets extracted from electronic health records.

Keywords: heart failure; hospital readmission; logical observation identifiers names and codes; medical informatics/standards; predictive modeling.

Abstract

Grants and funding