Skip to main page content
Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
. 2014 May;35(5):537-47.
doi: 10.1002/humu.22520. Epub 2014 Mar 6.

Prioritizing Disease-Linked Variants, Genes, and Pathways With an Interactive Whole-Genome Analysis Pipeline

Affiliations
Free PMC article

Prioritizing Disease-Linked Variants, Genes, and Pathways With an Interactive Whole-Genome Analysis Pipeline

In-Hee Lee et al. Hum Mutat. .
Free PMC article

Abstract

Whole-genome sequencing (WGS) studies are uncovering disease-associated variants in both rare and nonrare diseases. Utilizing the next-generation sequencing for WGS requires a series of computational methods for alignment, variant detection, and annotation, and the accuracy and reproducibility of annotation results are essential for clinical implementation. However, annotating WGS with up to date genomic information is still challenging for biomedical researchers. Here, we present one of the fastest and highly scalable annotation, filtering, and analysis pipeline-gNOME-to prioritize phenotype-associated variants while minimizing false-positive findings. Intuitive graphical user interface of gNOME facilitates the selection of phenotype-associated variants, and the result summaries are provided at variant, gene, and genome levels. Moreover, the enrichment results of specific variants, genes, and gene sets between two groups or compared with population scale WGS datasets that is already integrated in the pipeline can help the interpretation. We found a small number of discordant results between annotation software tools in part due to different reporting strategies for the variants with complex impacts. Using two published whole-exome datasets of uveal melanoma and bladder cancer, we demonstrated gNOME's accuracy of variant annotation and the enrichment of loss-of-function variants in known cancer pathways. gNOME Web server and source codes are freely available to the academic community (http://gnome.tchlab.org).

Keywords: analysis pipeline; disease gene discovery; variant annotation; whole-genome sequences.

Conflict of interest statement

Conflicts of Interest: Dr. Kohane is a member of the scientific advisory board of the SynapDx (Lexington, MA). All the other authors declare no conflict of interest.

Figures

Figure 1
Figure 1. A schematic overview of gNOME
The analysis of whole genome and exome dataset starts with creating a project and uploading it according to project type (Step 1). The uploaded files are annotated with 60 annotation tracks (Step 2), and annotation-based variant filtering can be interactively performed (Step 3). gNOME supports variant-, gene-, and gene set-level association tests between two groups: case vs. ethnicity-matched population data from the 1000 Genomes Project or cases vs. controls (Step 4). Filtering and analysis results are dynamically reported on the web-based interface (Step 5). Steps 3 to 5 can be performed iteratively based on different variant-filtering criteria.
Figure 2
Figure 2. Discovering somatic mutations in tumor-blood paired whole exome sequences
(A) A screenshot for comparing variants from tumor tissue (as ‘case’) and blood sample (as ‘control’), both of which come from a single patient (‘MM56’) (see Finding somatic mutations in uveal melanoma and Materials and Methods for detail). From both tumor tissue and blood sample, allele frequencies were estimated with (1) European ancestry, and (2) rare or novel (3) loss of function variants at (4) highly conserved loci were selected. Low-quality variants were excluded by setting (5) ‘Variant call score ≥ 20’. The potential somatic mutations were selected by choosing variants that were present in tumor sample but not in blood sample (6). (B) The result from the comparison shown in (A). The table can be searched for gene symbol or sorted by the columns. A total of 11 genes including BAP1 (displayed) met the criteria. gNOME performs a gene set enrichment analysis for 5 gene set categories with the genes that passed filtering criteria.
Figure 3
Figure 3. Association tests for variants, genes, and gene sets between two groups
The small number of variants that remain after the annotation-based filtering can be associated with a phenotype in three ways. First, we can test whether a specific variant presents more frequently in cases compared to controls or an ethnicity-matched population (A). Second, an association test can be performed at the gene-level when each case individual may have different variants on the same gene (B). Third, we can expand a gene-level aggregation to gene set-level to find the gene set over-represented with interesting variants among cases (C). The rows marked by x (red) denote “hypervariable” variants, genes, or gene sets that frequently have variants in both cases and controls (see Materials and Methods for details).
Figure 4
Figure 4. Comparison of annotation results from 4 software tools
For splice site disruption (A), nonsense (B), frameshift insertion and deletion (C), and nonstop (D) variants, we compared the annotation results from 4 different software tools by comparing genomic coordinate, alternative allele, and reported functional impact for each variant. The numbers next to tool names represent the total number of annotated variants in that category, and 4-way Venn diagrams show the concordant and discordant annotation results. Overall, the annotation results are comparable to each other between tools; however, splice site disruption has the most discordant results (A). ANNOVAR reports as frameshift even if such variants are found in canonical splice sites; however, gNOME and SeattleSeq report them as splice site disrupting variants. Supp. Table S3 lists the details on the discordant results.

Similar articles

  • WEP: a high-performance analysis pipeline for whole-exome data.
    D'Antonio M, D'Onorio De Meo P, Paoletti D, Elmi B, Pallocca M, Sanna N, Picardi E, Pesole G, Castrignanò T. D'Antonio M, et al. BMC Bioinformatics. 2013;14 Suppl 7(Suppl 7):S11. doi: 10.1186/1471-2105-14-S7-S11. Epub 2013 Apr 22. BMC Bioinformatics. 2013. PMID: 23815231 Free PMC article.
  • A community-based resource for automatic exome variant-calling and annotation in Mendelian disorders.
    Mutarelli M, Marwah V, Rispoli R, Carrella D, Dharmalingam G, Oliva G, di Bernardo D. Mutarelli M, et al. BMC Genomics. 2014;15 Suppl 3(Suppl 3):S5. doi: 10.1186/1471-2164-15-S3-S5. Epub 2014 May 6. BMC Genomics. 2014. PMID: 25078076 Free PMC article.
  • Current trend of annotating single nucleotide variation in humans--A case study on SNVrap.
    Li MJ, Wang J. Li MJ, et al. Methods. 2015 Jun;79-80:32-40. doi: 10.1016/j.ymeth.2014.10.003. Epub 2014 Oct 13. Methods. 2015. PMID: 25308971 Review.
  • Mitochondrial Disease Sequence Data Resource (MSeqDR): a global grass-roots consortium to facilitate deposition, curation, annotation, and integrated analysis of genomic data for the mitochondrial disease clinical and research communities.
    Falk MJ, Shen L, Gonzalez M, Leipzig J, Lott MT, Stassen AP, Diroma MA, Navarro-Gomez D, Yeske P, Bai R, Boles RG, Brilhante V, Ralph D, DaRe JT, Shelton R, Terry SF, Zhang Z, Copeland WC, van Oven M, Prokisch H, Wallace DC, Attimonelli M, Krotoski D, Zuchner S, Gai X; MSeqDR Consortium Participants; MSeqDR Consortium participants: Sherri Bale, Jirair Bedoyan, Doron Behar, Penelope Bonnen, Lisa Brooks, Claudia Calabrese, Sarah Calvo, Patrick Chinnery, John Christodoulou, Deanna Church,; Rosanna Clima, Bruce H. Cohen, Richard G. Cotton, IFM de Coo, Olga Derbenevoa, Johan T. den Dunnen, David Dimmock, Gregory Enns, Giuseppe Gasparre,; Amy Goldstein, Iris Gonzalez, Katrina Gwinn, Sihoun Hahn, Richard H. Haas, Hakon Hakonarson, Michio Hirano, Douglas Kerr, Dong Li, Maria Lvova, Finley Macrae, Donna Maglott, Elizabeth McCormick, Grant Mitchell, Vamsi K. Mootha, Yasushi Okazaki,; Aurora Pujol, Melissa Parisi, Juan Carlos Perin, Eric A. Pierce, Vincent Procaccio, Shamima Rahman, Honey Reddi, Heidi Rehm, Erin Riggs, Richard Rodenburg, Yaffa Rubinstein, Russell Saneto, Mariangela Santorsola, Curt Scharfe,; Claire Sheldon, Eric A. Shoubridge, Domenico Simone, Bert Smeets, Jan A. Smeitink, Christine Stanley, Anu Suomalainen, Mark Tarnopolsky, Isabelle Thiffault, David R. Thorburn, Johan Van Hove, Lynne Wolfe, and Lee-Jun Wong. Falk MJ, et al. Mol Genet Metab. 2015 Mar;114(3):388-96. doi: 10.1016/j.ymgme.2014.11.016. Epub 2014 Dec 4. Mol Genet Metab. 2015. PMID: 25542617 Free PMC article. Review.
  • Genome analysis and knowledge-driven variant interpretation with TGex.
    Dahary D, Golan Y, Mazor Y, Zelig O, Barshir R, Twik M, Iny Stein T, Rosner G, Kariv R, Chen F, Zhang Q, Shen Y, Safran M, Lancet D, Fishilevich S. Dahary D, et al. BMC Med Genomics. 2019 Dec 30;12(1):200. doi: 10.1186/s12920-019-0647-8. BMC Med Genomics. 2019. PMID: 31888639 Free PMC article.
See all similar articles

Cited by 15 articles

See all "Cited by" articles
Feedback