Evaluation of developed low-density genotype panels for imputation to higher density in independent dairy and beef cattle populations

J Anim Sci. 2016 Mar;94(3):949-62. doi: 10.2527/jas.2015-0044.

Abstract

The objective of this study was to develop, using alternative algorithms, low-density SNP genotyping panels (384 to 12,000 SNP), which can be accurately imputed to higher-density panels across independent cattle populations. Single nucleotide polymorphisms were selected based on genomic characteristics (i.e., linkage disequilibrium [LD], minor allele frequency [MAF], and genomic distance) in a population of 1,267 Holstein-Friesian animals genotyped on the Illumina Bovine50 Beadchip (54,001 SNP). Single nucleotide polymorphism selection methods included 1) random; 2) equidistant location; 3) combination of SNP MAF and LD structure while maintaining relatively equal genomic distance between adjacent SNP; 4) a combination of high MAF, genomic distance between selected and candidate SNP, and correlation between genotypes of selected and candidate SNP; and 5) a machine learning algorithm. The panels were validated separately in 1) a population of 750 Holstein-Friesian animals with masked genotypes to reflect the lower-density SNP densities under investigation (1,249 animals with complete genotypes included in reference population) and 2) a population of 359 Limousin and Charolais cattle with high (777,962 SNP)-density genotypes (1,918 animals with complete genotypes included in the reference population). Irrespective of SNP selection method, imputation accuracy in both populations improved at a diminishing rate as the number of SNP included in the lower-density genotype panel increased. Additionally, the variability in mean imputation accuracy per individual decreased as the panel density increased. The SNP selection method had a major impact on the mean allele concordance rate, although its impact diminished as the panel density increased. Imputation accuracy for SNP selected using a combination of high SNP MAF, LD structure, and relatively equal genomic distance between SNP outperformed all other selection methods in densities < 12,000 SNP. Using this method of SNP selection, the correlation between the imputed and actual genotypes for the 3,000 SNP panel was 0.90 and 0.96 when applied to the beef and dairy populations, respectively; the respective correlations for the 6,000 SNP panel were 0.95 and 0.98. It is necessary to include between 3,000 and 6,000 SNP in a low-density panel to achieve adequate imputation accuracy to either medium density (approximately 50,000 SNP in the dairy population) or high density (approximately 700,000 SNP in the beef population) across diverse and independent populations.

Publication types

  • Research Support, Non-U.S. Gov't

MeSH terms

  • Algorithms
  • Alleles
  • Animals
  • Cattle / genetics*
  • Cattle / physiology*
  • Gene Frequency
  • Genomics / methods
  • Genotype*
  • Linkage Disequilibrium
  • Polymorphism, Single Nucleotide