Modeling major lung resection outcomes using classification trees and multiple imputation techniques

Eur J Cardiothorac Surg. 2008 Nov;34(5):1085-9. doi: 10.1016/j.ejcts.2008.07.037. Epub 2008 Aug 29.


Objective: Modeling of operative risks associated with major lung resection is potentially inaccurate and inefficient because of incomplete observations for predictor variables (covariates). Missing values do not usually occur randomly, potentially introducing an important source of bias in modeling. Deletion of cases with missing data also results in loss of precision. The current study analyzes incomplete variables as potential predictors of outcomes after major lung resection using imputation techniques.

Methods: We analyzed major lung resection patients treated from 1980 to 2006 for predictors of pulmonary, cardiovascular, and overall complications, as well as mortality. Predictive variables were initially determined using classification and regression tree (CART) methods. Imputation models were developed and variables with missing values were multiply imputed. We fit a logistic regression model for each outcome using CART variables and any covariates that were of interest clinically.

Results: Of 1046 resected patients, serum albumin and diffusing capacity (DLCO%) had a large number of missing values (32% and 13% missing, respectively). Models included 10 covariates for pulmonary complications (p<0.05 for DLCO% and forced expiratory volume in the first second [FEV1%]), 12 covariates for cardiovascular complications (p<0.05 for FEV1%, extent of resection, year of operation, and age), 15 covariates for overall complications (p<0.05 for DLCO%, performance status, serum albumin, and FEV1/FVC ratio), and 12 covariates for death (p<0.05 for DLCO%, extent of resection, and operation year).

Conclusions: We identified serum albumin as a previously under-reported and strong predictor of overall complications. Serum albumin was marginally significantly related to pulmonary and cardiovascular outcomes after major lung surgery. Use of imputation techniques for modeling surgical risks has potential value in identifying important predictive variables that may ordinarily be eliminated from analysis or not identified as predictors because of incomplete observations in clinical databases.

MeSH terms

  • Adolescent
  • Adult
  • Aged
  • Aged, 80 and over
  • Biomarkers / metabolism
  • Epidemiologic Methods
  • Female
  • Forced Expiratory Volume / physiology
  • Humans
  • Lung Neoplasms / metabolism
  • Lung Neoplasms / mortality
  • Lung Neoplasms / surgery*
  • Male
  • Middle Aged
  • Pneumonectomy / methods*
  • Pneumonectomy / mortality
  • Postoperative Complications / mortality
  • Prognosis
  • Pulmonary Diffusing Capacity / physiology
  • Serum Albumin / metabolism*
  • Treatment Outcome
  • Young Adult


  • Biomarkers
  • Serum Albumin