ClinGen Allele Registry links information about genetic variants

Hum Mutat. 2018 Nov;39(11):1690-1701. doi: 10.1002/humu.23637.


Effective exchange of information about genetic variants is currently hampered by the lack of readily available globally unique variant identifiers that would enable aggregation of information from different sources. The ClinGen Allele Registry addresses this problem by providing (1) globally unique "canonical" variant identifiers (CAids) on demand, either individually or in large batches; (2) access to variant-identifying information in a searchable Registry; (3) links to allele-related records in many commonly used databases; and (4) services for adding links to information about registered variants in external sources. A core element of the Registry is a canonicalization service, implemented using in-memory sequence alignment-based index, which groups variant identifiers denoting the same nucleotide variant and assigns unique and dereferenceable CAids. More than 650 million distinct variants are currently registered, including those from gnomAD, ExAC, dbSNP, and ClinVar, including a small number of variants registered by Registry users. The Registry is accessible both via a web interface and programmatically via well-documented Hypertext Transfer Protocol (HTTP) Representational State Transfer Application Programming Interface (REST-APIs). For programmatic interoperability, the Registry content is accessible in the JavaScript Object Notation for Linked Data (JSON-LD) format. We present several use cases and demonstrate how the linked information may provide raw material for reasoning about variant's pathogenicity.

Keywords: HGVS representation; linked data; pathogenicity of genetic variants; variant centric resources; variant identifiers.

Publication types

  • Research Support, N.I.H., Extramural

MeSH terms

  • Alleles
  • Databases, Genetic*
  • Genetic Variation / genetics*
  • Humans
  • Registries
  • Software