Dealing with missing values in large-scale studies: microarray data imputation and beyond
- PMID: 19965979
- DOI: 10.1093/bib/bbp059
Dealing with missing values in large-scale studies: microarray data imputation and beyond
Abstract
High-throughput biotechnologies, such as gene expression microarrays or mass-spectrometry-based proteomic assays, suffer from frequent missing values due to various experimental reasons. Since the missing data points can hinder downstream analyses, there exists a wide variety of ways in which to deal with missing values in large-scale data sets. Nowadays, it has become routine to estimate (or impute) the missing values prior to the actual data analysis. After nearly a decade since the publication of the first missing value imputation methods for gene expression microarray data, new imputation approaches are still being developed at an increasing rate. However, what is lagging behind is a systematic and objective evaluation of the strengths and weaknesses of the different approaches when faced with different types of data sets and experimental questions. In this review, the present strategies for missing value imputation and the measures for evaluating their performance are described. The imputation methods are first reviewed in the context of gene expression microarray data, since most of the methods have been developed for estimating gene expression levels; then, we turn to other large-scale data sets that also suffer from the problems posed by missing values, together with pointers to possible imputation approaches in these settings. Along with a description of the basic principles behind the different imputation approaches, the review tries to provide practical guidance for the users of high-throughput technologies on how to choose the imputation tool for their data and questions, and some additional research directions for the developers of imputation methodologies.
Similar articles
-
DNA microarray data imputation and significance analysis of differential expression.Bioinformatics. 2005 Nov 15;21(22):4155-61. doi: 10.1093/bioinformatics/bti638. Epub 2005 Aug 23. Bioinformatics. 2005. PMID: 16118262
-
Collateral missing value imputation: a new robust missing value estimation algorithm for microarray data.Bioinformatics. 2005 May 15;21(10):2417-23. doi: 10.1093/bioinformatics/bti345. Epub 2005 Feb 24. Bioinformatics. 2005. PMID: 15731210
-
Missing value imputation for gene expression data: computational techniques to recover missing data from available information.Brief Bioinform. 2011 Sep;12(5):498-513. doi: 10.1093/bib/bbq080. Epub 2010 Dec 14. Brief Bioinform. 2011. PMID: 21156727 Review.
-
Ameliorative missing value imputation for robust biological knowledge inference.J Biomed Inform. 2008 Aug;41(4):499-514. doi: 10.1016/j.jbi.2007.10.005. Epub 2007 Dec 31. J Biomed Inform. 2008. PMID: 18334307
-
The use of multiple imputation for the analysis of missing data.Psychol Methods. 2001 Dec;6(4):317-29. Psychol Methods. 2001. PMID: 11778675 Review.
Cited by
-
Life and death of proteins: a case study of glucose-starved Staphylococcus aureus.Mol Cell Proteomics. 2012 Sep;11(9):558-70. doi: 10.1074/mcp.M112.017004. Epub 2012 May 3. Mol Cell Proteomics. 2012. PMID: 22556279 Free PMC article.
-
Challenges in microarray class discovery: a comprehensive examination of normalization, gene selection and clustering.BMC Bioinformatics. 2010 Oct 11;11:503. doi: 10.1186/1471-2105-11-503. BMC Bioinformatics. 2010. PMID: 20937082 Free PMC article.
-
Prediction and Characterization of Missing Proteomic Data in Desulfovibrio vulgaris.Comp Funct Genomics. 2011;2011:780973. doi: 10.1155/2011/780973. Epub 2011 May 4. Comp Funct Genomics. 2011. PMID: 21687592 Free PMC article.
-
Metabolomic Biomarker Identification in Presence of Outliers and Missing Values.Biomed Res Int. 2017;2017:2437608. doi: 10.1155/2017/2437608. Epub 2017 Feb 14. Biomed Res Int. 2017. PMID: 28293630 Free PMC article.
-
Gene network profiling in muscle-invasive bladder cancer: A systematic review and meta-analysis.Urol Oncol. 2022 May;40(5):197.e11-197.e23. doi: 10.1016/j.urolonc.2021.11.003. Epub 2022 Jan 15. Urol Oncol. 2022. PMID: 35039218 Free PMC article.
Publication types
MeSH terms
LinkOut - more resources
Full Text Sources
Other Literature Sources
