Background: High-throughput genetic testing is increasingly applied in clinics. Next-Generation Sequencing (NGS) data analysis however still remains a great challenge. The interpretation of pathogenicity of single variants or combinations of variants is crucial to provide accurate diagnostic information or guide therapies.
Methods: To facilitate the interpretation of variants and the selection of candidate non-synonymous polymorphisms (nsSNPs) for further clinical studies, we developed BALL-SNP. Starting from genetic variants in variant call format (VCF) files or tabular input, our tool, first, visualizes the three-dimensional (3D) structure of the respective proteins from the Protein Data Bank (PDB) and highlights mutated residues, automatically. Second, a hierarchical bottom up clustering on the nsSNPs within the 3D structure is performed to identify nsSNPs, which are close to each other. The modular and flexible implementation allows for straightforward integration of different databases for pathogenic and benign variants, but also enables the integration of pathogenicity prediction tools. The collected background information of all variants is presented below the 3D structure in an easily interpretable table format.
Results: First, we integrated different data resources into BALL-SNP, including databases containing information on genetic variants such as ClinVar or HUMSAVAR; third party tools that predict stability or pathogenicity in silico such as I-Mutant2.0; and additional information derived from the 3D structure such as a prediction of binding pockets. We then explored the applicability of BALL-SNP on the example of patients suffering from cardiomyopathies. Here, the analysis highlighted accumulation of variations in the genes JUP, VCL, and SMYD2.
Conclusion: Software solutions for analyzing high-throughput genomics data are important to support diagnosis and therapy selection. Our tool BALL-SNP, which is freely available at http://www.ccb.uni-saarland.de/BALL-SNP, combines genetic information with an easily interpretable and interactive, graphical representation of amino acid changes in proteins. Thereby relevant information from databases and computational tools is presented. Beyond this, proximity to functional sites or accumulations of mutations with a potential collective effect can be discovered.