Tandem repeat sequence variation as causative cis-eQTLs for protein-coding gene expression variation: the case of CSTB

Hum Mutat. 2012 Aug;33(8):1302-9. doi: 10.1002/humu.22115. Epub 2012 Jun 15.


Association studies have revealed expression quantitative trait loci (eQTLs) for a large number of genes. However, the causative variants that regulate gene expression levels are generally unknown. We hypothesized that copy-number variation of sequence repeats contribute to the expression variation of some genes. Our laboratory has previously identified that the rare expansion of a repeat c.-174CGGGGCGGGGCG in the promoter region of the CSTB gene causes a silencing of the gene, resulting in progressive myoclonus epilepsy. Here, we genotyped the repeat length and quantified CSTB expression by quantitative real-time polymerase chain reaction in 173 lymphoblastoid cell lines (LCLs) and fibroblast samples from the GenCord collection. The majority of alleles contain either two or three copies of this repeat. Independent analysis revealed that the c.-174CGGGGCGGGGCG repeat length is strongly associated with CSTB expression (P = 3.14 × 10(-11)) in LCLs only. Examination of both genotyped and imputed single-nucleotide polymorphisms (SNPs) within 2 Mb of CSTB revealed that the dodecamer repeat represents the strongest cis-eQTL for CSTB in LCLs. We conclude that the common two or three copy variation is likely the causative cis-eQTL for CSTB expression variation. More broadly, we propose that polymorphic tandem repeats may represent the causative variation of a fraction of cis-eQTLs in the genome.

Publication types

  • Research Support, Non-U.S. Gov't

MeSH terms

  • Cell Line
  • Cystatin B / genetics
  • Gene Expression / genetics
  • Humans
  • Polymorphism, Single Nucleotide
  • Quantitative Trait Loci / genetics*
  • Real-Time Polymerase Chain Reaction
  • Tandem Repeat Sequences / genetics*


  • CSTB protein, human
  • Cystatin B