Comparative methods for handling missing data in large databases
- PMID: 23830314
- DOI: 10.1016/j.jvs.2013.05.008
Comparative methods for handling missing data in large databases
Abstract
Objective: Analysis of complex survey databases is an important tool for health services researchers. Missing data elements are challenging because the reasons for "missingness" are multifactorial, especially categorical variables such as race. We simulated missing data for race and analyzed the bias from five methods used in predicting major amputation in patients with critical limb ischemia (CLI).
Methods: Patient discharges with fully observed data containing lower extremity revascularization or major amputation and CLI were selected from the 2003 to 2007 Nationwide Inpatient Sample, a complex survey database (weighted n = 684,057). Considering several random missing data schemes, we compared five missing data methods: complete case analysis, replacement with observed frequencies, missing indicator variable, multiple imputation, and reweighted estimating equations. We created 100 simulated data sets, with 5%, 15%, or 30% of subjects' race drawn to be missing from the full data set. Bias was estimated by comparing the estimated regression coefficients averaged over 100 simulated data sets (β(miss)) from each method vs estimates from the fully observed data set (β(full)), with relative bias calculated as (β(full) - β(miss)/β(full)) × 100%.
Results: Our results demonstrate that reweighted estimating equations produce the least biased and the missing indicator variable produces the most biased coefficients. Complete case analysis, replacement with observed frequencies, and multiple imputation resulted in moderate bias. Sensitivity analysis demonstrated the optimal method choice depends on the quantity and type of missing data encountered.
Conclusions: Missing data are an important analytic topic in research with large databases. The commonly used missing indicator variable method introduces severe bias and should be used with caution. We present empiric evidence to guide method selection for handling missing data.
Copyright © 2013 Society for Vascular Surgery. Published by Mosby, Inc. All rights reserved.
Similar articles
-
Multiple imputation using auxiliary imputation variables that only predict missingness can increase bias due to data missing not at random.BMC Med Res Methodol. 2024 Oct 7;24(1):231. doi: 10.1186/s12874-024-02353-9. BMC Med Res Methodol. 2024. PMID: 39375597 Free PMC article.
-
Multiple imputation for handling missing outcome data when estimating the relative risk.BMC Med Res Methodol. 2017 Sep 6;17(1):134. doi: 10.1186/s12874-017-0414-5. BMC Med Res Methodol. 2017. PMID: 28877666 Free PMC article.
-
Handling of missing data to improve the mining of large feed databases.J Anim Sci. 2013 Jan;91(1):491-500. doi: 10.2527/jas.2012-5491. Epub 2012 Oct 9. J Anim Sci. 2013. PMID: 23048146
-
Review: a gentle introduction to imputation of missing values.J Clin Epidemiol. 2006 Oct;59(10):1087-91. doi: 10.1016/j.jclinepi.2006.01.014. Epub 2006 Jul 11. J Clin Epidemiol. 2006. PMID: 16980149 Review.
-
Improved amputation-free survival in unreconstructable critical limb ischemia and its implications for clinical trial design and quality measurement.J Vasc Surg. 2012 Mar;55(3):781-9. doi: 10.1016/j.jvs.2011.10.089. Epub 2011 Dec 29. J Vasc Surg. 2012. PMID: 22209608 Review.
Cited by
-
Hunger and housing: Economic disparities in current and daily tobacco use among high school students in the United States in 2021.Prev Med Rep. 2024 Oct 18;47:102901. doi: 10.1016/j.pmedr.2024.102901. eCollection 2024 Nov. Prev Med Rep. 2024. PMID: 39498206 Free PMC article.
-
Factors Associated with Missing Sociodemographic Data in the IRIS® (Intelligent Research in Sight) Registry.Ophthalmol Sci. 2024 Apr 30;4(6):100542. doi: 10.1016/j.xops.2024.100542. eCollection 2024 Nov-Dec. Ophthalmol Sci. 2024. PMID: 39139543 Free PMC article.
-
Development of Chronic Pain Conditions Among Women in the Military Health System.JAMA Netw Open. 2024 Jul 1;7(7):e2420393. doi: 10.1001/jamanetworkopen.2024.20393. JAMA Netw Open. 2024. PMID: 38967922 Free PMC article.
-
Utilizing an Intersectional Approach to Examine Experiences of Hunger Among Adolescents During COVID-19: Considering Race/Ethnicity, Sexual Identity, and Employment Disparities in a Nationally Representative Sample.J Racial Ethn Health Disparities. 2024 May 15. doi: 10.1007/s40615-024-02019-8. Online ahead of print. J Racial Ethn Health Disparities. 2024. PMID: 38748362
-
Reductions in sustained prescription opioid use within the US between 2017 and 2021.Sci Rep. 2024 Jan 16;14(1):1432. doi: 10.1038/s41598-024-52032-4. Sci Rep. 2024. PMID: 38228721 Free PMC article.
Publication types
MeSH terms
Grants and funding
LinkOut - more resources
Full Text Sources
Other Literature Sources
Research Materials
Miscellaneous
