Don't split your data

Eur J Epidemiol. 2010 May;25(5):283-4. doi: 10.1007/s10654-010-9447-3. Epub 2010 Mar 26.

Abstract

False positive findings are a common problem in whole genome association studies. In this commentary we show that nothing is gained by randomly splitting a data sample to two equal sized subsets, where the first data subset is used for explorative purposes and the other sub set is used to confirm the findings in the first subset. We compare the random splitting procedure to using the full data sample for analysis, by using a Bayesian perspective with consideration taken to prior probability of a false positive finding.

MeSH terms

  • Bayes Theorem
  • Bias*
  • Genome-Wide Association Study / methods*
  • Humans