Zero-inflated and hurdle models of count data with extra zeros: examples from an HIV-risk reduction intervention trial

Am J Drug Alcohol Abuse. 2011 Sep;37(5):367-75. doi: 10.3109/00952990.2011.597280.

Abstract

Background: In clinical trials of behavioral health interventions, outcome variables often take the form of counts, such as days using substances or episodes of unprotected sex. Classically, count data follow a Poisson distribution; however, in practice such data often display greater heterogeneity in the form of excess zeros (zero-inflation) or greater spread in the values (overdispersion) or both. Greater sample heterogeneity may be especially common in community-based effectiveness trials, where broad eligibility criteria are implemented to achieve a generalizable sample.

Objectives: This article reviews the characteristics of Poisson model and the related models that have been developed to handle overdispersion (negative binomial (NB) model) or zero-inflation (zero-inflated Poisson (ZIP) and Poisson hurdle (PH) models) or both (zero-inflated negative binomial (ZINB) and negative binomial hurdle (NBH) models).

Methods: All six models were used to model the effect of an HIV-risk reduction intervention on the count of unprotected sexual occasions (USOs), using data from a previously completed clinical trial among female patients (N = 515) participating in community-based substance abuse treatment (Tross et al. Effectiveness of HIV/AIDS sexual risk reduction groups for women in substance abuse treatment programs: Results of NIDA Clinical Trials Network Trial. J Acquir Immune Defic Syndr 2008; 48(5):581-589). Goodness of fit and the estimates of treatment effect derived from each model were compared.

Results: The ZINB model provided the best fit, yielding a medium-sized effect of intervention.

Conclusions and scientific significance: This article illustrates the consequences of applying models with different distribution assumptions on the data. If a model used does not closely fit the shape of the data distribution, the estimate of the effect of the intervention may be biased, either over- or underestimating the intervention effect.

Publication types

  • Multicenter Study
  • Randomized Controlled Trial
  • Research Support, N.I.H., Extramural

MeSH terms

  • Adult
  • Bias
  • Community Health Services
  • Female
  • Follow-Up Studies
  • HIV Infections / prevention & control*
  • Humans
  • Models, Statistical*
  • Poisson Distribution
  • Risk Reduction Behavior
  • Risk-Taking
  • Substance-Related Disorders / rehabilitation*
  • Unsafe Sex / statistics & numerical data*