Annotation of functional variation in personal genomes using RegulomeDB

Genome Res. 2012 Sep;22(9):1790-7. doi: 10.1101/gr.137323.112.

Abstract

As the sequencing of healthy and disease genomes becomes more commonplace, detailed annotation provides interpretation for individual variation responsible for normal and disease phenotypes. Current approaches focus on direct changes in protein coding genes, particularly nonsynonymous mutations that directly affect the gene product. However, most individual variation occurs outside of genes and, indeed, most markers generated from genome-wide association studies (GWAS) identify variants outside of coding segments. Identification of potential regulatory changes that perturb these sites will lead to a better localization of truly functional variants and interpretation of their effects. We have developed a novel approach and database, RegulomeDB, which guides interpretation of regulatory variants in the human genome. RegulomeDB includes high-throughput, experimental data sets from ENCODE and other sources, as well as computational predictions and manual annotations to identify putative regulatory potential and identify functional variants. These data sources are combined into a powerful tool that scores variants to help separate functional variants from a large pool and provides a small set of putative sites with testable hypotheses as to their function. We demonstrate the applicability of this tool to the annotation of noncoding variants from 69 full sequenced genomes as well as that of a personal genome, where thousands of functionally associated variants were identified. Moreover, we demonstrate a GWAS where the database is able to quickly identify the known associated functional variant and provide a hypothesis as to its function. Overall, we expect this approach and resource to be valuable for the annotation of human genome sequences.

Publication types

  • Research Support, N.I.H., Extramural
  • Research Support, Non-U.S. Gov't

MeSH terms

  • DNA-Binding Proteins / genetics
  • Databases, Genetic*
  • Genetic Variation*
  • Genome, Human*
  • Genome-Wide Association Study
  • Genotype
  • Humans
  • Internet
  • Intracellular Signaling Peptides and Proteins / genetics
  • Lupus Erythematosus, Systemic / genetics
  • Molecular Sequence Annotation*
  • Nuclear Proteins / genetics
  • Open Reading Frames
  • Polymorphism, Single Nucleotide
  • Regulatory Sequences, Nucleic Acid
  • Tumor Necrosis Factor alpha-Induced Protein 3

Substances

  • DNA-Binding Proteins
  • Intracellular Signaling Peptides and Proteins
  • Nuclear Proteins
  • TNFAIP3 protein, human
  • Tumor Necrosis Factor alpha-Induced Protein 3