Boosting with missing predictors

Biostatistics. 2010 Apr;11(2):195-212. doi: 10.1093/biostatistics/kxp052. Epub 2009 Nov 30.

Abstract

Boosting is an important tool in classification methodology. It combines the performance of many weak classifiers to produce a powerful committee, and its validity can be explained by additive modeling and maximum likelihood. The method has very general applications, especially for high-dimensional predictors. For example, it can be applied to distinguish cancer samples from healthy control samples by using antibody microarray data. Microarray data are often high-dimensional and many of them are incomplete. One natural idea is to impute a missing variable based on the observed predictors. However, the calculation of imputation for high-dimensional predictors with missing data may be rather tedious. In this paper, we propose 2 conditional mean imputation methods. They can be applied to the situation even when a complete-case subset does not exist. Simulation results indicate that the proposed methods are superior than other naive methods. We apply the methods to a pancreatic cancer study in which serum protein microarrays are used for classification.

Publication types

  • Research Support, N.I.H., Extramural

MeSH terms

  • Algorithms
  • Artificial Intelligence
  • Biomarkers, Tumor / analysis
  • Biomarkers, Tumor / blood
  • Biometry / methods*
  • C-Reactive Protein / analysis
  • C-Reactive Protein / metabolism
  • Computer Simulation
  • Humans
  • Models, Statistical*
  • Multivariate Analysis
  • Pancreatic Neoplasms / blood
  • Pancreatic Neoplasms / classification
  • Pancreatic Neoplasms / diagnosis*
  • Protein Array Analysis / methods*
  • Regression Analysis

Substances

  • Biomarkers, Tumor
  • C-Reactive Protein