Detecting linkage disequilibrium in bacterial populations

Genetics. 1998 Dec;150(4):1341-8. doi: 10.1093/genetics/150.4.1341.

Abstract

The distribution of the number of pairwise differences calculated from comparisons between n haploid genomes has frequently been used as a starting point for testing the hypothesis of linkage equilibrium. For this purpose the variance of the pairwise differences, VD, is used as a test statistic to evaluate the null hypothesis that all loci are in linkage equilibrium. The problem is to determine the critical value of the distribution of VD. This critical value can be estimated either by Monte Carlo simulation or by assuming that VD is distributed normally and calculating a one-tailed 95% critical value for VD, L, L = EVD + 1.645 sqrt(VarVD), where E(VD) is the expectation of VD, and Var(VD) is the variance of VD. If VD (observed) > L, the null hypothesis of linkage equilibrium is rejected. Using Monte Carlo simulation we show that the formula currently available for Var(VD) is incorrect, especially for genetically highly diverse data. This has implications for hypothesis testing in bacterial populations, which are often genetically highly diverse. For this reason we derive a new, exact formula for Var(VD). The distribution of VD is examined and shown to approach normality as the sample size increases. This makes the new formula a useful tool in the investigation of large data sets, where testing for linkage using Monte Carlo simulation can be very time consuming. Application of the new formula, in conjunction with Monte Carlo simulation, to populations of Bradyrhizobium japonicum, Rhizobium leguminosarum, and Bacillus subtilis reveals linkage disequilibrium where linkage equilibrium has previously been reported.

Publication types

  • Research Support, Non-U.S. Gov't

MeSH terms

  • Analysis of Variance
  • Bacillus subtilis / classification
  • Bacillus subtilis / genetics*
  • Bradyrhizobium / classification
  • Bradyrhizobium / genetics
  • Escherichia coli / classification
  • Escherichia coli / genetics
  • Genetic Variation
  • Gram-Negative Bacteria / classification
  • Gram-Negative Bacteria / genetics*
  • Linkage Disequilibrium*
  • Mathematics
  • Monte Carlo Method
  • Neisseria gonorrhoeae / classification
  • Neisseria gonorrhoeae / genetics
  • Rhizobium leguminosarum / classification
  • Rhizobium leguminosarum / genetics