Classification and regression tree analysis vs. multivariable linear and logistic regression methods as statistical tools for studying haemophilia

Haemophilia. 2015 Nov;21(6):715-22. doi: 10.1111/hae.12778. Epub 2015 Aug 7.


Introduction: Haemophilia is a rare genetic haemorrhagic disease characterized by partial or complete deficiency of coagulation factor VIII, for haemophilia A, or IX, for haemophilia B. As in any other medical research domain, the field of haemophilia research is increasingly concerned with finding factors associated with binary or continuous outcomes through multivariable models. Traditional models include multiple logistic regressions, for binary outcomes, and multiple linear regressions for continuous outcomes. Yet these regression models are at times difficult to implement, especially for non-statisticians, and can be difficult to interpret.

Aims: The present paper sought to didactically explain how, why, and when to use classification and regression tree (CART) analysis for haemophilia research.

Materials & methods: The CART method is non-parametric and non-linear, based on the repeated partitioning of a sample into subgroups based on a certain criterion. Breiman developed this method in 1984. Classification trees (CTs) are used to analyse categorical outcomes and regression trees (RTs) to analyse continuous ones.

Results: The CART methodology has become increasingly popular in the medical field, yet only a few examples of studies using this methodology specifically in haemophilia have to date been published. Two examples using CART analysis and previously published in this field are didactically explained in details.

Conclusion: There is increasing interest in using CART analysis in the health domain, primarily due to its ease of implementation, use, and interpretation, thus facilitating medical decision-making. This method should be promoted for analysing continuous or categorical outcomes in haemophilia, when applicable.

Keywords: classification and regression tree; haemophilia; multivariable analysis; non-parametric statistics; random forest; statistics.

Publication types

  • Review

MeSH terms

  • Biostatistics / methods*
  • Hemophilia A*
  • Hemophilia B*
  • Humans
  • Linear Models
  • Logistic Models
  • Multivariate Analysis