Integration of SNP genotyping confidence scores in IBD inference

Bioinformatics. 2011 Oct 15;27(20):2880-7. doi: 10.1093/bioinformatics/btr486. Epub 2011 Aug 23.

Abstract

Motivation: High-throughput single nucleotide polymorphism (SNP) arrays have become the standard platform for linkage and association analyses. The high SNP density of these platforms allows high-resolution identification of ancestral recombination events even for distant relatives many generations apart. However, such inference is sensitive to marker mistyping and current error detection methods rely on the genotyping of additional close relatives. Genotyping algorithms provide a confidence score for each marker call that is currently not integrated in existing methods. There is a need for a model that incorporates this prior information within the standard identical by descent (IBD) and association analyses.

Results: We propose a novel model that incorporates marker confidence scores within IBD methods based on the Lander-Green Hidden Markov Model. The novel parameter of this model is the joint distribution of confidence scores and error status per array. We estimate this probability distribution by applying a modified expectation-maximization (EM) procedure on data from nuclear families genotyped with Affymetrix 250K SNP arrays. The converged tables from two different genotyping algorithms are shown for a wide range of error rates. We demonstrate the efficacy of our method in refining the detection of IBD signals using nuclear pedigrees and distant relatives.

Availability: Plinke, a new version of Plink with an extended pairwise IBD inference model allowing per marker error probabilities is freely available at: http://bioinfo.bgu.ac.il/bsu/software/plinke.

Contact: obirk@bgu.ac.il; markusb@bgu.ac.il

Supplementary information: Supplementary data are available at Bioinformatics online.

Publication types

  • Research Support, N.I.H., Extramural
  • Research Support, Non-U.S. Gov't

MeSH terms

  • Algorithms
  • Genotype
  • Genotyping Techniques*
  • Humans
  • Markov Chains
  • Models, Statistical
  • Pedigree
  • Polymorphism, Single Nucleotide*