A statistical method for studying correlated rare events and their risk factors

Stat Methods Med Res. 2017 Jun;26(3):1416-1428. doi: 10.1177/0962280215581112. Epub 2015 Apr 8.

Abstract

Longitudinal studies of rare events such as cervical high-grade lesions or colorectal polyps that can recur often involve correlated binary data. Risk factor for these events cannot be reliably examined using conventional statistical methods. For example, logistic regression models that incorporate generalized estimating equations often fail to converge or provide inaccurate results when analyzing data of this type. Although exact methods have been reported, they are complex and computationally difficult. The current paper proposes a mathematically straightforward and easy-to-use two-step approach involving (i) an additive model to measure associations between a rare or uncommon correlated binary event and potential risk factors and (ii) a permutation test to estimate the statistical significance of these associations. Simulation studies showed that the proposed method reliably tests and accurately estimates the associations of exposure with correlated binary rare events. This method was then applied to a longitudinal study of human leukocyte antigen (HLA) genotype and risk of cervical high grade squamous intraepithelial lesions (HSIL) among HIV-infected and HIV-uninfected women. Results showed statistically significant associations of two HLA alleles among HIV-negative but not HIV-positive women, suggesting that immune status may modify the HLA and cervical HSIL association. Overall, the proposed method avoids model nonconvergence problems and provides a computationally simple, accurate, and powerful approach for the analysis of risk factor associations with rare/uncommon correlated binary events.

Keywords: Correlated data; exact method; generalized estimating equation; permutation; rare events.

MeSH terms

  • Female
  • HIV Infections / complications*
  • HIV Infections / genetics
  • HLA Antigens / genetics
  • Humans
  • Logistic Models
  • Longitudinal Studies
  • Papillomavirus Infections / complications
  • Papillomavirus Infections / genetics
  • Papillomavirus Infections / virology
  • Randomized Controlled Trials as Topic / methods*
  • Risk Factors
  • Sample Size
  • Uterine Cervical Dysplasia / complications*
  • Uterine Cervical Dysplasia / genetics
  • Uterine Cervical Dysplasia / virology
  • Uterine Cervical Neoplasms / complications*
  • Uterine Cervical Neoplasms / genetics
  • Uterine Cervical Neoplasms / virology

Substances

  • HLA Antigens