SICE: an improved missing data imputation technique
- PMID: 32547903
- PMCID: PMC7291187
- DOI: 10.1186/s40537-020-00313-w
SICE: an improved missing data imputation technique
Abstract
In data analytics, missing data is a factor that degrades performance. Incorrect imputation of missing values could lead to a wrong prediction. In this era of big data, when a massive volume of data is generated in every second, and utilization of these data is a major concern to the stakeholders, efficiently handling missing values becomes more important. In this paper, we have proposed a new technique for missing data imputation, which is a hybrid approach of single and multiple imputation techniques. We have proposed an extension of popular Multivariate Imputation by Chained Equation (MICE) algorithm in two variations to impute categorical and numeric data. We have also implemented twelve existing algorithms to impute binary, ordinal, and numeric missing values. We have collected sixty-five thousand real health records from different hospitals and diagnostic centers of Bangladesh, maintaining the privacy of data. We have also collected three public datasets from the UCI Machine Learning Repository, ETH Zurich, and Kaggle. We have compared the performance of our proposed algorithms with existing algorithms using these datasets. Experimental results show that our proposed algorithm achieves 20% higher F-measure for binary data imputation and 11% less error for numeric data imputations than its competitors with similar execution time.
Keywords: Data Analytics; MICE; Missing Data Imputation; Multiple Imputation; Single Imputation.
© The Author(s) 2020.
Conflict of interest statement
Competing interestsThe authors do not have any competing interests.
Figures
Similar articles
-
Advanced methods for missing values imputation based on similarity learning.PeerJ Comput Sci. 2021 Jul 21;7:e619. doi: 10.7717/peerj-cs.619. eCollection 2021. PeerJ Comput Sci. 2021. PMID: 34395861 Free PMC article.
-
Mechanism-aware imputation: a two-step approach in handling missing values in metabolomics.BMC Bioinformatics. 2022 May 16;23(1):179. doi: 10.1186/s12859-022-04659-1. BMC Bioinformatics. 2022. PMID: 35578165 Free PMC article.
-
Missing value imputation in high-dimensional phenomic data: imputable or not, and how?BMC Bioinformatics. 2014 Nov 5;15(1):346. doi: 10.1186/s12859-014-0346-6. BMC Bioinformatics. 2014. PMID: 25371041 Free PMC article.
-
Deep imputation of missing values in time series health data: A review with benchmarking.J Biomed Inform. 2023 Aug;144:104440. doi: 10.1016/j.jbi.2023.104440. Epub 2023 Jul 8. J Biomed Inform. 2023. PMID: 37429511 Review.
-
Missing value imputation for gene expression data: computational techniques to recover missing data from available information.Brief Bioinform. 2011 Sep;12(5):498-513. doi: 10.1093/bib/bbq080. Epub 2010 Dec 14. Brief Bioinform. 2011. PMID: 21156727 Review.
Cited by
-
Comparison of the effects of imputation methods for missing data in predictive modelling of cohort study datasets.BMC Med Res Methodol. 2024 Feb 16;24(1):41. doi: 10.1186/s12874-024-02173-x. BMC Med Res Methodol. 2024. PMID: 38365610 Free PMC article.
-
Expanding Training Data for Structure-Based Receptor-Ligand Binding Affinity Regression through Imputation of Missing Labels.ACS Omega. 2023 Oct 26;8(44):41680-41688. doi: 10.1021/acsomega.3c05931. eCollection 2023 Nov 7. ACS Omega. 2023. PMID: 37970017 Free PMC article.
-
Environmental and behavioral factors associated with household transmission of SARS-CoV-2 in children and adolescents.Front Pediatr. 2023 Oct 20;11:1239372. doi: 10.3389/fped.2023.1239372. eCollection 2023. Front Pediatr. 2023. PMID: 37928354 Free PMC article.
-
Assessing the impact of fire on spiders through a global comparative analysis.Proc Biol Sci. 2023 Apr 26;290(1997):20230089. doi: 10.1098/rspb.2023.0089. Epub 2023 Apr 26. Proc Biol Sci. 2023. PMID: 37122254 Free PMC article.
-
Computational Methods Summarizing Mutational Patterns in Cancer: Promise and Limitations for Clinical Applications.Cancers (Basel). 2023 Mar 24;15(7):1958. doi: 10.3390/cancers15071958. Cancers (Basel). 2023. PMID: 37046619 Free PMC article. Review.
References
-
- Tsai Chun-Wei, Lai Chin-Feng, Chao Han-Chieh, Vasilakos Athanasios V. Big data analytics: a survey. J Big Data. 2015;2(1):21. doi: 10.1186/s40537-015-0030-3. - DOI
-
- Brown ML, Kros JF. Data mining and the impact of missing data. Ind Manag Data Syst. 2003;103(8):611–621. doi: 10.1108/02635570310497657. - DOI
-
- Rahm Erhard, Do Hong Hai. Data cleaning: problems and current approaches. IEEE Data Eng Bull. 2000;23(4):3–13.
LinkOut - more resources
Full Text Sources