Efficient estimation of grouped survival models

Zhiguo Li; Jiaxing Lin; Alexander B Sibley; Tracy Truong; Katherina C Chua; Yu Jiang; Janice McCarthy; Deanna L Kroetz; Andrew Allen; Kouros Owzar

doi:10.1186/s12859-019-2899-x

Efficient estimation of grouped survival models

BMC Bioinformatics. 2019 May 28;20(1):269. doi: 10.1186/s12859-019-2899-x.

Authors

Affiliations

¹ Department of Biostatistics and Bioinformatics, Duke University, Durham, USA. zhiguo.li@duke.edu.
² Department of Biostatistics and Bioinformatics, Duke University, Durham, USA.
³ Duke Cancer Institute, Duke University, Durham, USA.
⁴ Department of Bioengineering and Therapeutic Sciences, University of California San Francisco, San Francisco, USA.

Abstract

Background: Time- and dose-to-event phenotypes used in basic science and translational studies are commonly measured imprecisely or incompletely due to limitations of the experimental design or data collection schema. For example, drug-induced toxicities are not reported by the actual time or dose triggering the event, but rather are inferred from the cycle or dose to which the event is attributed. This exemplifies a prevalent type of imprecise measurement called grouped failure time, where times or doses are restricted to discrete increments. Failure to appropriately account for the grouped nature of the data, when present, may lead to biased analyses.

Results: We present groupedSurv, an R package which implements a statistically rigorous and computationally efficient approach for conducting genome-wide analyses based on grouped failure time phenotypes. Our approach accommodates adjustments for baseline covariates, and analysis at the variant or gene level. We illustrate the statistical properties of the approach and computational performance of the package by simulation. We present the results of a reanalysis of a published genome-wide study to identify common germline variants associated with the risk of taxane-induced peripheral neuropathy in breast cancer patients.

Conclusions: groupedSurv enables fast and rigorous genome-wide analysis on the basis of grouped failure time phenotypes at the variant, gene or pathway level. The package is freely available under a public license through the Comprehensive R Archive Network.

Keywords: Discrete censoring; Efficient score; Genome-wide analysis; Grouped data; Heritability; Multiple testing; Pharmacogenomics; Score statistic.

MeSH terms

Benchmarking
Gene Frequency / genetics
Genome-Wide Association Study*
Humans
Likelihood Functions
Models, Genetic*
Phenotype
Software
Statistics as Topic

Abstract

MeSH terms

Grants and funding