Variance explained by whole genome sequence variants in coding and regulatory genome annotations for six dairy traits

BMC Genomics. 2018 Apr 5;19(1):237. doi: 10.1186/s12864-018-4617-x.


Background: There are an exceedingly large number of sequence variants discovered through whole genome sequencing in most populations, including cattle. Deciphering which of these affect complex traits is a major challenge. In this study we hypothesize that variants in some functional classes, such as splice site regions, coding regions, DNA methylated regions and long noncoding RNA will explain more variance in complex traits than others. Two variance component approaches were used to test this hypothesis - the first determines if variants in a functional class capture a greater proportion of the variance, than expected by chance, the second uses the proportion of variance explained when variants in all annotations are fitted simultaneously.

Results: Our data set consisted of 28.3 million imputed whole genome sequence variants in 16,581 dairy cattle with records for 6 complex trait phenotypes, including production and fertility. We found that sequence variants in splice site regions and synonymous classes captured the greatest proportion of the variance, explaining up to 50% of the variance across all traits. We also found sequence variants in target sites for DNA methylation (genomic regions that are found be highly methylated in bovine placentas), captured a significant proportion of the variance. Per sequence variant, splice site variants explain the highest proportion of variance in this study. The proportion of variance captured by the missense predicted deleterious (from SIFT) and missense tolerated classes was relatively small.

Conclusion: The results demonstrate using functional annotations to filter whole genome sequence variants into more informative subsets could be useful for prioritization of the variants that are more likely to be associated with complex traits. In addition to variants found in splice sites and protein coding genes regulatory variants and those found in DNA methylated regions, explained considerable variation in milk production and fertility traits. In our analysis synonymous variants captured a significant proportion of the variance, which raises the possible explanation that synonymous mutations might have some effects, or more likely that these variants are miss-annotated, or alternatively the results reflect imperfect imputation of the actual causative variants.

Keywords: DNA methylated regions; Enrichment or depletion analysis; Functional genomics; Regulatory genome; Splice sites; Variance component analysis.

MeSH terms

  • Animals
  • Cattle
  • Female
  • Fertility
  • Gene Frequency
  • Gene Regulatory Networks*
  • Genetic Variation*
  • Molecular Sequence Annotation
  • Pregnancy
  • Quantitative Trait Loci*
  • RNA Splice Sites
  • Whole Genome Sequencing / veterinary*


  • RNA Splice Sites