Categorical data analysis in public health

Annu Rev Public Health. 1997;18:51-82. doi: 10.1146/annurev.publhealth.18.1.51.


A greater variety of categorical data methods are used today than 15 years ago. This article surveys categorical data methods widely applied in public health research. Whereas large sample chi-square methods, logistic regression analysis, and weighted least squares modeling of repeated measures once comprised the primary analytic tools for categorical data problems, today's methodology is comprised of a much broader range of tools made available by increasing computational efficiency. These include computational algorithms for exact inference of small samples and sparsely distributed data, conditional logistic regression for modeling highly stratified data, and generalized estimating equations for cluster samples. The latter, in particular, has found wide use in modeling the marginal probabilities of correlated counted, binary, and multinomial outcomes. The various methods are illustrated with examples including a study of the prevalence of cerebral palsy in very low birthweight infants and a study of cancer screening in primary care settings.

Publication types

  • Research Support, U.S. Gov't, P.H.S.
  • Review

MeSH terms

  • Algorithms
  • Chi-Square Distribution
  • Data Interpretation, Statistical*
  • Humans
  • Least-Squares Analysis
  • Logistic Models
  • Models, Statistical*
  • Odds Ratio
  • Public Health*