Power and sample size calculations for high-throughput sequencing-based experiments

Brief Bioinform. 2018 Nov 27;19(6):1247-1255. doi: 10.1093/bib/bbx061.

Abstract

Power/sample size (power) analysis estimates the likelihood of successfully finding the statistical significance in a data set. There has been a growing recognition of the importance of power analysis in the proper design of experiments. Power analysis is complex, yet necessary for the success of large studies. It is important to design a study that produces statistically accurate and reliable results. Power computation methods have been well established for both microarray-based gene expression studies and genotyping microarray-based genome-wide association studies. High-throughput sequencing (HTS) has greatly enhanced our ability to conduct biomedical studies at the highest possible resolution (per nucleotide). However, the complexity of power computations is much greater for sequencing data than for the simpler genotyping array data. Research on methods of power computations for HTS-based studies has been recently conducted but is not yet well known or widely used. In this article, we describe the power computation methods that are currently available for a range of HTS-based studies, including DNA sequencing, RNA-sequencing, microbiome sequencing and chromatin immunoprecipitation sequencing. Most importantly, we review the methods of power analysis for several types of sequencing data and guide the reader to the relevant methods for each data type.

Publication types

  • Research Support, N.I.H., Extramural

MeSH terms

  • Chromatin Immunoprecipitation
  • Genome-Wide Association Study
  • Heterozygote
  • High-Throughput Nucleotide Sequencing / methods*
  • Humans
  • Microbiota
  • Mutation
  • Poisson Distribution
  • Sequence Analysis, RNA / methods