Boosting with missing predictors

C Y Wang; Ziding Feng

doi:10.1093/biostatistics/kxp052

Boosting with missing predictors

Biostatistics. 2010 Apr;11(2):195-212. doi: 10.1093/biostatistics/kxp052. Epub 2009 Nov 30.

Authors

C Y Wang¹, Ziding Feng

Affiliation

¹ Division of Public Health Sciences, Fred Hutchinson Cancer Research Center, 1100 Fairview Avenue North, Seattle, WA 98109-1024, USA. cywang@fhcrc.org

Abstract

Boosting is an important tool in classification methodology. It combines the performance of many weak classifiers to produce a powerful committee, and its validity can be explained by additive modeling and maximum likelihood. The method has very general applications, especially for high-dimensional predictors. For example, it can be applied to distinguish cancer samples from healthy control samples by using antibody microarray data. Microarray data are often high-dimensional and many of them are incomplete. One natural idea is to impute a missing variable based on the observed predictors. However, the calculation of imputation for high-dimensional predictors with missing data may be rather tedious. In this paper, we propose 2 conditional mean imputation methods. They can be applied to the situation even when a complete-case subset does not exist. Simulation results indicate that the proposed methods are superior than other naive methods. We apply the methods to a pancreatic cancer study in which serum protein microarrays are used for classification.

Publication types

Research Support, N.I.H., Extramural

MeSH terms

Algorithms
Artificial Intelligence
Biomarkers, Tumor / analysis
Biomarkers, Tumor / blood
Biometry / methods*
C-Reactive Protein / analysis
C-Reactive Protein / metabolism
Computer Simulation
Humans
Models, Statistical*
Multivariate Analysis
Pancreatic Neoplasms / blood
Pancreatic Neoplasms / classification
Pancreatic Neoplasms / diagnosis*
Protein Array Analysis / methods*
Regression Analysis

Substances

Biomarkers, Tumor
C-Reactive Protein

Grants and funding

CA53996/CA/NCI NIH HHS/United States