An assessment of linkage disequilibrium in Holstein cattle using a Bayesian network

J Anim Breed Genet. 2012 Dec;129(6):474-87. doi: 10.1111/jbg.12002. Epub 2012 Sep 13.


Linkage disequilibrium (LD) is defined as a non-random association of the distributions of alleles at different loci within a population. This association between loci is valuable in prediction of quantitative traits in animals and plants and in genome-wide association studies. A question that arises is whether standard metrics such as D' and r(2) reflect complex associations in a genetic system properly. It seems reasonable to take the view that loci associate and interact together as a system or network, as opposed to in a simple pairwise manner. We used a Bayesian network (BN) as a representation of choice for an LD network. A BN is a graphical depiction of a probability distribution and can represent sets of conditional independencies. Moreover, it provides a visual display of the joint distribution of the set of random variables in question. The usefulness of BN for linkage disequilibrium was explored and illustrated using genetic marker loci found to have the strongest effects on milk protein in Holstein cattle based on three strategies for ranking marker effect estimates: posterior means, standardized posterior means and additive genetic variance. Two different algorithms, Tabu search (a local score-based algorithm) and incremental association Markov blanket (a constraint-based algorithm), coupled with the chi-square test, were used for learning the structure of the BN and were compared with the reference r(2) metric represented as an LD heat map. The BN captured several genetic markers associated as clusters, implying that markers are inter-related in a complicated manner. Further, the BN detected conditionally dependent markers. The results confirm that LD relationships are of a multivariate nature and that r(2) gives an incomplete description and understanding of LD. Use of an LD Bayesian network enables inferring associations between loci in a systems framework and provides a more accurate picture of LD than that resulting from the use of pairwise metrics.

Publication types

  • Research Support, Non-U.S. Gov't
  • Research Support, U.S. Gov't, Non-P.H.S.

MeSH terms

  • Algorithms
  • Animals
  • Bayes Theorem
  • Cattle / genetics*
  • Cattle / metabolism
  • Genetic Loci / genetics
  • Linkage Disequilibrium*
  • Milk Proteins / metabolism
  • Polymorphism, Single Nucleotide / genetics
  • Regression Analysis


  • Milk Proteins