A Burden of Rare Variants Associated with Extremes of Gene Expression in Human Peripheral Blood

Am J Hum Genet. 2016 Feb 4;98(2):299-309. doi: 10.1016/j.ajhg.2015.12.023.

Abstract

In order to evaluate whether rare regulatory variants in the vicinity of promoters are likely to impact gene expression, we conducted a novel burden test for enrichment of rare variants at the extremes of expression. After sequencing 2-kb promoter regions of 472 genes in 410 healthy adults, we performed a quadratic regression of rare variant count on bins of peripheral blood transcript abundance from microarrays, summing over ranks of all genes. After adjusting for common eQTLs and the major axes of gene expression covariance, a highly significant excess of variants with minor allele frequency less than 0.05 at both high and low extremes across individuals was observed. Further enrichment was seen in sites annotated as potentially regulatory by RegulomeDB, but a deficit of effects was associated with known metabolic disease genes. The main result replicates in an independent sample of 75 individuals with RNA-seq and whole-genome sequence information. Three of four predicted large-effect sites were validated by CRISPR/Cas9 knockdown in K562 cells, but simulations indicate that effect sizes need not be unusually large to produce the observed burden. Unusually divergent low-frequency promoter haplotypes were observed at 31 loci, at least 9 of which appear to be derived from Neandertal admixture, but these were not associated with divergent gene expression in blood. The overall burden test results are consistent with rare and private regulatory variants driving high or low transcription at specific loci, potentially contributing to disease.

Publication types

  • Research Support, N.I.H., Extramural

MeSH terms

  • Adult
  • Aged
  • Aged, 80 and over
  • Female
  • Gene Expression Profiling
  • Gene Expression*
  • Gene Frequency
  • Genetic Loci
  • Genomics
  • Genotyping Techniques
  • Haplotypes
  • Humans
  • Linear Models
  • Male
  • Middle Aged
  • Polymorphism, Single Nucleotide*
  • Promoter Regions, Genetic
  • Reproducibility of Results
  • Sequence Analysis, DNA
  • Young Adult