Genetic identification of a common collagen disease in puerto ricans via identity-by-descent mapping in a health system

Elife. 2017 Sep 12:6:e25060. doi: 10.7554/eLife.25060.


Achieving confidence in the causality of a disease locus is a complex task that often requires supporting data from both statistical genetics and clinical genomics. Here we describe a combined approach to identify and characterize a genetic disorder that leverages distantly related patients in a health system and population-scale mapping. We utilize genomic data to uncover components of distant pedigrees, in the absence of recorded pedigree information, in the multi-ethnic BioMe biobank in New York City. By linking to medical records, we discover a locus associated with both elevated genetic relatedness and extreme short stature. We link the gene, COL27A1, with a little-known genetic disease, previously thought to be rare and recessive. We demonstrate that disease manifests in both heterozygotes and homozygotes, indicating a common collagen disorder impacting up to 2% of individuals of Puerto Rican ancestry, leading to a better understanding of the continuum of complex and Mendelian disease.

Keywords: Electronic Health Records; GWAS; collagen disorder; evolutionary biology; genomics; human; human biology; medical genetics; medicine; population genetics.

MeSH terms

  • Adolescent
  • Adult
  • Aged
  • Child
  • Collagen Diseases / epidemiology*
  • Collagen Diseases / genetics*
  • Female
  • Fibrillar Collagens / genetics*
  • Genotype
  • Heterozygote
  • Hispanic or Latino
  • Homozygote
  • Humans
  • Male
  • Middle Aged
  • Molecular Epidemiology*
  • Multigene Family
  • Musculoskeletal Diseases / epidemiology
  • Musculoskeletal Diseases / genetics
  • New York City / epidemiology
  • New York City / ethnology
  • Pedigree*
  • Whole Genome Sequencing
  • Young Adult


  • COL27A1 protein, human
  • Fibrillar Collagens