HGVbase: a human sequence variation database emphasizing data quality and a broad spectrum of data sources

Nucleic Acids Res. 2002 Jan 1;30(1):387-91. doi: 10.1093/nar/30.1.387.


HGVbase (Human Genome Variation database; http://hgvbase.cgb.ki.se, formerly known as HGBASE) is an academic effort to provide a high quality and non-redundant database of available genomic variation data of all types, mostly comprising single nucleotide polymorphisms (SNPs). Records include neutral polymorphisms as well as disease-related mutations. Online search tools facilitate data interrogation by sequence similarity and keyword queries, and searching by genome coordinates is now being implemented. Downloads are freely available in XML, Fasta, SRS, SQL and tagged-text file formats. Each entry is presented in the context of its surrounding sequence and many records are related to neighboring human genes and affected features therein. Population allele frequencies are included wherever available. Thorough semi-automated data checking ensures internal consistency and addresses common errors in the source information. To keep pace with recent growth in the field, we have developed tools for fully automated annotation. All variants have been uniquely mapped to the draft genome sequence and are referenced to positions in EMBL/GenBank files. Data utility is enhanced by provision of genotyping assays and functional predictions. Recent data structure extensions allow the capture of haplotype and genotype information, and a new initiative (along with BiSC and HUGO-MDI) aims to create a central repository for the broad collection of clinical mutations and associated disease phenotypes of interest.

Publication types

  • Research Support, Non-U.S. Gov't

MeSH terms

  • Base Sequence
  • Chromosome Mapping
  • Database Management Systems
  • Databases, Nucleic Acid*
  • Gene Frequency
  • Genetic Diseases, Inborn / genetics
  • Genetic Variation*
  • Genome, Human*
  • Humans
  • Information Storage and Retrieval
  • Internet
  • Polymorphism, Single Nucleotide*
  • Quality Control
  • Systems Integration