The Relationship Between Haplotype-Based FST and Haplotype Length

Genetics. 2019 Sep;213(1):281-295. doi: 10.1534/genetics.119.302430. Epub 2019 Jul 8.

Abstract

The population-genetic statistic [Formula: see text] is used widely to describe allele frequency distributions in subdivided populations. The increasing availability of DNA sequence data has recently enabled computations of [Formula: see text] from sequence-based "haplotype loci." At the same time, theoretical work has revealed that [Formula: see text] has a strong dependence on the underlying genetic diversity of a locus from which it is computed, with high diversity constraining values of [Formula: see text] to be low. In the case of haplotype loci, for which two haplotypes that are distinct over a specified length along a chromosome are treated as distinct alleles, genetic diversity is influenced by haplotype length: longer haplotype loci have the potential for greater genetic diversity. Here, we study the dependence of [Formula: see text] on haplotype length. Using a model in which a haplotype locus is sequentially incremented by one biallelic locus at a time, we show that increasing the length of the haplotype locus can either increase or decrease the value of [Formula: see text], and usually decreases it. We compute [Formula: see text] on haplotype loci in human populations, finding a close correspondence between the observed values and our theoretical predictions. We conclude that effects of haplotype length are valuable to consider when interpreting [Formula: see text] calculated on haplotypic data.

Keywords: SNPs; haplotypes; linkage disequilibrium; population structure.

Publication types

  • Research Support, N.I.H., Extramural
  • Research Support, Non-U.S. Gov't
  • Research Support, U.S. Gov't, Non-P.H.S.

MeSH terms

  • Gene Frequency*
  • Genome-Wide Association Study / methods
  • Haplotypes*
  • Humans
  • Linkage Disequilibrium
  • Models, Genetic
  • Polymorphism, Single Nucleotide*