Significant allele flipping, where associations for the same disease occur at opposite alleles of the same bi-allelic locus, is increasing. But when is a significant allele flip genuine? We address the statistical issues of claiming and observing genuine allele flips in actual samples. We show that unless an allele flip is genuine, the probability of observing a significant allele flip in samples ascertained similarly from a common population is negligible. We derive expressions for the expected values of commonly used measures of association, which confirm previous findings that the underlying mechanism of a genuine allele flip is variation in the haplotype frequencies and show further how this variation interacts with variation in the genetic effects to impact allele flipping. We show that for association testing at proxy SNPs, common in genome-wide association studies, variation in haplotype frequencies must coincide with a reversal in the sign of linkage disequilibrium (LD) to trigger genuine allele flips. Using HapMap data and r, rather than r(2), to highlight previously unobserved effects, we show that unless genetic effects are large, variation in LD is unlikely to cause genuine allele flips in samples drawn from the same population. However, as populations diverge, it is an increasingly viable cause of a genuine allele flip for sufficiently large genetic effect and/or sample sizes. We conclude that evidence of variation in local patterns of LD, ancestral composition of study samples, and environmental exposures between study populations can provide compelling practical evidence in defense of a genuine allele flip.