Skip to main page content
Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
. 2016 Sep 1;99(3):595-606.
doi: 10.1016/j.ajhg.2016.07.005. Epub 2016 Aug 25.

A Whole-Genome Analysis Framework for Effective Identification of Pathogenic Regulatory Variants in Mendelian Disease

Affiliations
Free PMC article

A Whole-Genome Analysis Framework for Effective Identification of Pathogenic Regulatory Variants in Mendelian Disease

Damian Smedley et al. Am J Hum Genet. .
Free PMC article

Abstract

The interpretation of non-coding variants still constitutes a major challenge in the application of whole-genome sequencing in Mendelian disease, especially for single-nucleotide and other small non-coding variants. Here we present Genomiser, an analysis framework that is able not only to score the relevance of variation in the non-coding genome, but also to associate regulatory variants to specific Mendelian diseases. Genomiser scores variants through either existing methods such as CADD or a bespoke machine learning method and combines these with allele frequency, regulatory sequences, chromosomal topological domains, and phenotypic relevance to discover variants associated to specific Mendelian disorders. Overall, Genomiser is able to identify causal regulatory variants as the top candidate in 77% of simulated whole genomes, allowing effective detection and discovery of regulatory variants in Mendelian disease.

Figures

Figure 1
Figure 1
Genomic Attributes of Regulatory Mendelian Mutations (A) Centered mean and scaled genomic attributes of Mendelian non-coding mutations as compared with the derived non-deleterious positions. Five highly informative attributes of different attribute groups are shown. The information content of single attributes was computed with a univariate logistic regression model (Table S3). (B) Principal-component analysis plot showing the first two principle components, which make up 32% of the total variability.
Figure 2
Figure 2
Regulatory Mendelian Mutation-Deleteriousness Score (A) Summary of the algorithm for deriving the ReMM score. (B and C) Performance comparison between ReMM and other state-of-the-art genome-wide deleteriousness score. (B) Receiver operating characteristic curves. (C) Precision-recall curves.
Figure 3
Figure 3
The Genomiser Analysis Framework Genomiser takes as input a whole-genome variant call format (VCF) file, a list of human phenotype ontology (HPO) terms representing the clinical signs and symptoms observed in the individual being investigated by WGS, and optional user parameters that control the filtering and prioritization steps. See text for details of the prioritization procedure.
Figure 4
Figure 4
Performance Evaluation of Genomiser The curated Mendelian regulatory mutations were added one at a time to unaffected genomes from the 1000 Genomes Project to generate 10,419 simulated disease genomes (see Material and Methods). As an additional test, the same simulations were performed using the CADD score instead of the ReMM score. The genomes were also run under the same frequency, inheritance, and phenotype conditions through Phen-Gen. Bars show percentage of genomes in which the true variant was prioritized as the top hit when assessing all the genomes or the subcategories involving promoter, UTR, enhancer, RNA gene, microRNA gene (miRNA), and imprinting control region (ICR) variants.

Similar articles

See all similar articles

Cited by 58 articles

See all "Cited by" articles

Publication types

Feedback