Background: In clinical trials of behavioral health interventions, outcome variables often take the form of counts, such as days using substances or episodes of unprotected sex. Classically, count data follow a Poisson distribution; however, in practice such data often display greater heterogeneity in the form of excess zeros (zero-inflation) or greater spread in the values (overdispersion) or both. Greater sample heterogeneity may be especially common in community-based effectiveness trials, where broad eligibility criteria are implemented to achieve a generalizable sample.
Objectives: This article reviews the characteristics of Poisson model and the related models that have been developed to handle overdispersion (negative binomial (NB) model) or zero-inflation (zero-inflated Poisson (ZIP) and Poisson hurdle (PH) models) or both (zero-inflated negative binomial (ZINB) and negative binomial hurdle (NBH) models).
Methods: All six models were used to model the effect of an HIV-risk reduction intervention on the count of unprotected sexual occasions (USOs), using data from a previously completed clinical trial among female patients (N = 515) participating in community-based substance abuse treatment (Tross et al. Effectiveness of HIV/AIDS sexual risk reduction groups for women in substance abuse treatment programs: Results of NIDA Clinical Trials Network Trial. J Acquir Immune Defic Syndr 2008; 48(5):581-589). Goodness of fit and the estimates of treatment effect derived from each model were compared.
Results: The ZINB model provided the best fit, yielding a medium-sized effect of intervention.
Conclusions and scientific significance: This article illustrates the consequences of applying models with different distribution assumptions on the data. If a model used does not closely fit the shape of the data distribution, the estimate of the effect of the intervention may be biased, either over- or underestimating the intervention effect.