Efficient estimation of grouped survival models

BMC Bioinformatics. 2019 May 28;20(1):269. doi: 10.1186/s12859-019-2899-x.

Abstract

Background: Time- and dose-to-event phenotypes used in basic science and translational studies are commonly measured imprecisely or incompletely due to limitations of the experimental design or data collection schema. For example, drug-induced toxicities are not reported by the actual time or dose triggering the event, but rather are inferred from the cycle or dose to which the event is attributed. This exemplifies a prevalent type of imprecise measurement called grouped failure time, where times or doses are restricted to discrete increments. Failure to appropriately account for the grouped nature of the data, when present, may lead to biased analyses.

Results: We present groupedSurv, an R package which implements a statistically rigorous and computationally efficient approach for conducting genome-wide analyses based on grouped failure time phenotypes. Our approach accommodates adjustments for baseline covariates, and analysis at the variant or gene level. We illustrate the statistical properties of the approach and computational performance of the package by simulation. We present the results of a reanalysis of a published genome-wide study to identify common germline variants associated with the risk of taxane-induced peripheral neuropathy in breast cancer patients.

Conclusions: groupedSurv enables fast and rigorous genome-wide analysis on the basis of grouped failure time phenotypes at the variant, gene or pathway level. The package is freely available under a public license through the Comprehensive R Archive Network.

Keywords: Discrete censoring; Efficient score; Genome-wide analysis; Grouped data; Heritability; Multiple testing; Pharmacogenomics; Score statistic.

MeSH terms

  • Benchmarking
  • Gene Frequency / genetics
  • Genome-Wide Association Study*
  • Humans
  • Likelihood Functions
  • Models, Genetic*
  • Phenotype
  • Software
  • Statistics as Topic