Skip to main page content
Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
. 2017 Nov 15;7(1):15680.
doi: 10.1038/s41598-017-15947-9.

An Exome Sequencing Based Approach for Genome-Wide Association Studies in the Dog

Affiliations
Free PMC article

An Exome Sequencing Based Approach for Genome-Wide Association Studies in the Dog

Bart J G Broeckx et al. Sci Rep. .
Free PMC article

Abstract

Genome-wide association studies (GWAS) are widely used to identify loci associated with phenotypic traits in the domestic dog that has emerged as a model for Mendelian and complex traits. However, a disadvantage of GWAS is that it always requires subsequent fine-mapping or sequencing to pinpoint causal mutations. Here, we performed whole exome sequencing (WES) and canine high-density (cHD) SNP genotyping of 28 dogs from 3 breeds to compare the SNP and linkage disequilibrium characteristics together with the power and mapping precision of exome-guided GWAS (EG-GWAS) versus cHD-based GWAS. Using simulated phenotypes, we showed that EG-GWAS has a higher power than cHD to detect associations within target regions and less power outside target regions, with power being influenced further by sample size and SNP density. We analyzed two real phenotypes (hair length and furnishing), that are fixed in certain breeds to characterize mapping precision of the known causal mutations. EG-GWAS identified the associated exonic and 3'UTR variants within the FGF5 and RSPO2 genes, respectively, with only a few samples per breed. In conclusion, we demonstrated that EG-GWAS can identify loci associated with Mendelian phenotypes both within and across breeds.

Conflict of interest statement

The authors declare that they have no competing interests.

Figures

Figure 1
Figure 1
Distribution of distance between subsequent SNPs for the exome-1.0 and canine high-density array (chromosome 1). Only those SNPs that passed the filters for linkage disequilibrium calculations were used (sufficiently polymorphic, sufficient call rate (see methods section)). Distances are expressed in bp.
Figure 2
Figure 2
Relation between linkage disequilibrium, gene annotation, SNP density and distance between tagSNPs on chromosome 1. (a) Whole exome sequencing (WES)- and canine high-density array (cHD)-specific informative SNP count per bin (binsize: 1 Mb) relative to position. (b) Overview of RefSeq Genes track (blue) and Ensembl Gene Predictions track (brown) density relative to position. (c) WES- and cHD-specific linkage disequilibrium (measured in r²) relative to position. (d) Relation between r² and distance between subsequent SNPs. In each graph, lines are obtained with LOWESS (locally weighed scatterplot smoothing).
Figure 3
Figure 3
The effect of subsampling SNPs on r² relative to position. From the original 9541 SNPs on chromosome 1, random subsampling was performed, reducing the number of SNPs from 9000 to 1500 in 6 steps of 1500. In each step, 10 subsets were randomly sampled (without replacement). The number of SNPs that were polymorphic is depicted in the graph. At 4500 SNPs, WES and cHD had an equal number of informative tagSNPs (WES: 3310 SNPs, cHD: 3365 SNPs). Lines are obtained with LOWESS (locally weighed scatterplot smoothing).
Figure 4
Figure 4
Power and distance between causal and tagSNPs for the exome-1.0 and canine high-density 170k array. (a) Boxplots showing the power to detect the association when a signal is located inside the target regions and outside the target regions. (b) Boxplots showing distance between the most significant SNP and the causal SNP when the signal is located inside or outside the target regions, respectively. (c,d) Boxplots showing power and distance to detect a non-exonic signal inside WES bins with a high informative SNP density (corresponding to the 85th percentile or higher, threshold: ≥48 SNPs/Mb) and a low informative SNP density (corresponding to at most the 15th percentile, threshold: ≤4 SNPs/Mb). (e) Boxplots showing power to detect a signal in long intergenic non-coding RNAs (lincRNAs). (f) Effect of sample size reduction on power to detect a monogenic recessive trait. Subsampling was performed stepwise, from 14 down to 6 samples and for each step, at least 20% of all possible permutations of samples were performed. The bottom and top of the boxplot represent the first (Q1) and third quartile (Q3), while the horizontal line in the boxplot represents the median. Whiskers represent 1.5 times the interquartile range (Q3-Q1).

Similar articles

See all similar articles

Cited by 3 articles

References

    1. Mellersh C. DNA testing and domestic dogs. Mamm. Genome. 2012;23:109–123. doi: 10.1007/s00335-011-9365-z. - DOI - PMC - PubMed
    1. Lindblad-Toh K, et al. Genome sequence, comparative analysis and haplotype structure of the domestic dog. Nature. 2005;438:803–819. doi: 10.1038/nature04338. - DOI - PubMed
    1. Nicholas FW, Crook A, Sargan DR. Internet resources cataloguing inherited disorders in dogs. Vet. J. 2011;189:132–135. doi: 10.1016/j.tvjl.2011.06.009. - DOI - PubMed
    1. Broeckx BJG, et al. An heuristic filtering tool to identify phenotype-associated genetic variants applied to human intellectual disability and canine coat colors. BMC Bioinformatics. 2015;16:391. doi: 10.1186/s12859-015-0822-7. - DOI - PMC - PubMed
    1. Broeckx BJG, et al. Improved canine exome designs, featuring ncRNAs and increased coverage of protein coding genes. Sci. Rep. 2015;5:12810. doi: 10.1038/srep12810. - DOI - PMC - PubMed
Feedback