Purpose: Distributed research networks (DRNs) afford statistical power by integrating observational data from multiple partners for retrospective studies. However, laboratory test results across care sites are derived using different assays from varying patient populations, making it difficult to simply combine data for analysis. Additionally, existing normalization methods are not suitable for retrospective studies. We normalized laboratory results from different data sources by adjusting for heterogeneous clinico-epidemiologic characteristics of the data and called this the subgroup-adjusted normalization (SAN) method.
Methods: Subgroup-adjusted normalization renders the means and standard deviations of distributions identical under population structure-adjusted conditions. To evaluate its performance, we compared SAN with existing methods for simulated and real datasets consisting of blood urea nitrogen, serum creatinine, hematocrit, hemoglobin, serum potassium, and total bilirubin. Various clinico-epidemiologic characteristics can be applied together in SAN. For simplicity of comparison, age and gender were used to adjust population heterogeneity in this study.
Results: In simulations, SAN had the lowest standardized difference in means (SDM) and Kolmogorov-Smirnov values for all tests (p < 0.05). In a real dataset, SAN had the lowest SDM and Kolmogorov-Smirnov values for blood urea nitrogen, hematocrit, hemoglobin, and serum potassium, and the lowest SDM for serum creatinine (p < 0.05).
Conclusion: Subgroup-adjusted normalization performed better than normalization using other methods. The SAN method is applicable in a DRN environment and should facilitate analysis of data integrated across DRN partners for retrospective observational studies.
Keywords: distributed research networks; electronic health records; laboratory test; normalization; pharmacoepidemiology.
Copyright © 2015 John Wiley & Sons, Ltd.