The projack: a resampling approach to correct for ranking bias in high-throughput studies

Biostatistics. 2016 Jan;17(1):54-64. doi: 10.1093/biostatistics/kxv022. Epub 2015 Jun 3.

Abstract

The problem of ranked inference arises in a number of settings, for which the investigator wishes to perform parameter inference after ordering a set of [Formula: see text] statistics. In contrast to inference for a single hypothesis, the ranking procedure introduces considerable bias, a problem known as the "winner's curse" in genetic association. We introduce the projack (for Prediction by Re- Ordered Jackknife and Cross-Validation, [Formula: see text]-fold). The projack is a resampling-based procedure that provides low-bias estimates of the expected ranked effect size parameter for a set of possibly correlated [Formula: see text] statistics. The approach is flexible, and has wide applicability to high-dimensional datasets, including those arising from genomics platforms. Initially, motivated for the setting where original data are available for resampling, the projack can be extended to the situation where only the vector of [Formula: see text] values is available. We illustrate the projack for correction of the winner's curse in genetic association, although it can be used much more generally.

Keywords: Bias correction; Effect size estimation; Winner's curse.

Publication types

  • Research Support, N.I.H., Extramural
  • Research Support, U.S. Gov't, Non-P.H.S.

MeSH terms

  • Bias
  • Data Interpretation, Statistical*
  • Genome-Wide Association Study / methods*
  • Humans
  • Psoriasis / genetics