Protein-coding repeat polymorphisms strongly shape diverse human phenotypes

Science. 2021 Sep 24;373(6562):1499-1505. doi: 10.1126/science.abg8289. Epub 2021 Sep 23.


Many human proteins contain domains that vary in size or copy number because of variable numbers of tandem repeats (VNTRs) in protein-coding exons. However, the relationships of VNTRs to most phenotypes are unknown because of difficulties in measuring such repetitive elements. We developed methods to estimate VNTR lengths from whole-exome sequencing data and impute VNTR alleles into single-nucleotide polymorphism haplotypes. Analyzing 118 protein-altering VNTRs in 415,280 UK Biobank participants for association with 786 phenotypes identified some of the strongest associations of common variants with human phenotypes, including height, hair morphology, and biomarkers of health. Accounting for large-effect VNTRs further enabled fine-mapping of associations to many more protein-coding mutations in the same genes. These results point to cryptic effects of highly polymorphic common structural variants that have eluded molecular analyses to date.

Publication types

  • Research Support, N.I.H., Extramural
  • Research Support, Non-U.S. Gov't

MeSH terms

  • Aggrecans / genetics
  • Antigens / genetics
  • Black People
  • Body Height / genetics
  • Exome Sequencing
  • Genetic Association Studies
  • Genome, Human*
  • Hair
  • Haplotypes
  • Humans
  • Intermediate Filament Proteins / genetics
  • Kidney / physiology
  • Lipoprotein(a) / blood
  • Lipoprotein(a) / genetics
  • Minisatellite Repeats / genetics*
  • Mucin-1 / genetics
  • Phenotype*
  • Polymorphism, Genetic*
  • Polymorphism, Single Nucleotide
  • Polynucleotide Adenylyltransferase / genetics
  • White People / genetics


  • ACAN protein, human
  • Aggrecans
  • Antigens
  • Intermediate Filament Proteins
  • Lipoprotein(a)
  • MUC1 protein, human
  • Mucin-1
  • TCHH protein, human
  • Polynucleotide Adenylyltransferase
  • TENT5A protein, human