The influence of missing value imputation on detection of differentially expressed genes from microarray data

Bioinformatics. 2005 Dec 1;21(23):4272-9. doi: 10.1093/bioinformatics/bti708. Epub 2005 Oct 10.

Abstract

Motivation: Missing values are problematic for the analysis of microarray data. Imputation methods have been compared in terms of the similarity between imputed and true values in simulation experiments and not of their influence on the final analysis. The focus has been on missing at random, while entries are missing also not at random.

Results: We investigate the influence of imputation on the detection of differentially expressed genes from cDNA microarray data. We apply ANOVA for microarrays and SAM and look to the differentially expressed genes that are lost because of imputation. We show that this new measure provides useful information that the traditional root mean squared error cannot capture. We also show that the type of missingness matters: imputing 5% missing not at random has the same effect as imputing 10-30% missing at random. We propose a new method for imputation (LinImp), fitting a simple linear model for each channel separately, and compare it with the widely used KNNimpute method. For 10% missing at random, KNNimpute leads to twice as many lost differentially expressed genes as LinImp.

Availability: The R package for LinImp is available at http://folk.uio.no/idasch/imp.

MeSH terms

  • Algorithms
  • Analysis of Variance
  • Cluster Analysis
  • Computational Biology / methods*
  • DNA, Complementary / metabolism
  • Data Interpretation, Statistical
  • Gene Expression Profiling
  • Gene Expression Regulation*
  • Likelihood Functions
  • Linear Models
  • Mathematical Computing
  • Models, Genetic
  • Models, Statistical
  • Models, Theoretical
  • Multigene Family
  • Normal Distribution
  • Oligonucleotide Array Sequence Analysis / methods*
  • Reproducibility of Results
  • Sensitivity and Specificity
  • Sequence Analysis, DNA
  • Software
  • Statistics as Topic

Substances

  • DNA, Complementary