Background: Although the general statistical advice is to keep continuous exposure variables as continuous in statistical analyses, categorisation is still a common approach in medical research. In a recent paper from the Hyperglycaemia and Adverse Pregnancy Outcome (HAPO) Study, categorisation of body mass index (BMI) was used when analysing the effect of BMI on adverse pregnancy outcomes. The lowest category, labelled "underweight", was used as the reference category.
Methods: The present paper gives a summary of reasons for categorisation and methodological drawbacks of this approach. We also discuss the choice of reference category and alternative analyses. We exemplify our arguments by a reanalysis of results from the HAPO paper.
Results: Categorisation of continuous exposure data results in loss of power and other methodological challenges. An unfortunate choice of reference category can give additional lack of precision and obscure the interpretation of risk estimates. A highlighted odds ratio (OR) in the HAPO study is the OR for birth weight >90(th) percentile for women in the highest compared to the lowest BMI category ("obese class III" versus "underweight"). This estimate was OR = 4.55 and OR = 3.52, with two different multiple logistic regression models. When using the "normal weight" category as the reference, our corresponding estimates were OR = 2.03 and OR = 1.62, respectively. Moreover, our choice of reference category also gave narrower confidence intervals.
Summary: Due to several methodological drawbacks, categorisation should be avoided. Modern statistical analyses should be used to analyse continuous exposure data, and to explore non-linear relations. If continuous data are categorised, special attention must be given to the choice of reference category.