The use of sequence data in genomic prediction models is a topic of high interest, given the decreasing prices of current 'next'-generation sequencing technologies (NGS) and the theoretical possibility of directly interrogating the genomes for all causal mutations. Here, we compare by simulation how well genetic relationships (G) could be estimated using either NGS or ascertained SNP arrays. DNA sequences were simulated using the coalescence according to two scenarios: a 'cattle' scenario that consisted of a bottleneck followed by a split in two breeds without migration, and a 'pig' model where Chinese introgression into international pig breeds was simulated. We found that introgression results in a large amount of variability across the genome and between individuals, both in differentiation and in diversity. In general, NGS data allowed the most accurate estimates of G, provided enough sequencing depth was available, because shallow NGS (4×) may result in highly distorted estimates of G elements, especially if not standardized by allele frequency. However, high-density genotyping can also result in accurate estimates of G. Given that genotyping is much less noisy than NGS data, it is suggested that specific high-density arrays (~3M SNPs) that minimize the effects of ascertainment could be developed in the population of interest by sequencing the most influential animals and rely on those arrays for implementing genomic selection.
Keywords: Coalescence; SNP ascertainment; genomic selection; molecular relationship matrix; next-generation sequencing.
© 2014 Blackwell Verlag GmbH.