GREEN-DB: a framework for the annotation and prioritization of non-coding regulatory variants from whole-genome sequencing data

Edoardo Giacopuzzi; Niko Popitsch; Jenny C Taylor

doi:10.1093/nar/gkac130

GREEN-DB: a framework for the annotation and prioritization of non-coding regulatory variants from whole-genome sequencing data

Nucleic Acids Res. 2022 Mar 21;50(5):2522-2535. doi: 10.1093/nar/gkac130.

Authors

Edoardo Giacopuzzi^{1

2}, Niko Popitsch^{1

3}, Jenny C Taylor^{1

2}

Affiliations

¹ Wellcome Centre for Human Genetics, University of Oxford, Oxford OX3 7BN, UK.
² National Institute for Health Research Oxford Biomedical Research Centre, Oxford OX4 2PG, UK.
³ Max Perutz Labs, University of Vienna, Dr. Bohr-Gasse 9, 1030 Vienna, Austria.

Abstract

Non-coding variants have long been recognized as important contributors to common disease risks, but with the expansion of clinical whole genome sequencing, examples of rare, high-impact non-coding variants are also accumulating. Despite recent advances in the study of regulatory elements and the availability of specialized data collections, the systematic annotation of non-coding variants from genome sequencing remains challenging. Here, we propose a new framework for the prioritization of non-coding regulatory variants that integrates information about regulatory regions with prediction scores and HPO-based prioritization. Firstly, we created a comprehensive collection of annotations for regulatory regions including a database of 2.4 million regulatory elements (GREEN-DB) annotated with controlled gene(s), tissue(s) and associated phenotype(s) where available. Secondly, we calculated a variation constraint metric and showed that constrained regulatory regions associate with disease-associated genes and essential genes from mouse knock-outs. Thirdly, we compared 19 non-coding impact prediction scores providing suggestions for variant prioritization. Finally, we developed a VCF annotation tool (GREEN-VARAN) that can integrate all these elements to annotate variants for their potential regulatory impact. In our evaluation, we show that GREEN-DB can capture previously published disease-associated non-coding variants as well as identify additional candidate disease genes in trio analyses.

Publication types

Research Support, Non-U.S. Gov't

MeSH terms

Animals
Base Sequence
Mice
Molecular Sequence Annotation*
Whole Genome Sequencing

Abstract

Publication types

MeSH terms

Grants and funding