SVFX: a machine learning framework to quantify the pathogenicity of structural variants

Genome Biol. 2020 Nov 9;21(1):274. doi: 10.1186/s13059-020-02178-x.

Abstract

There is a lack of approaches for identifying pathogenic genomic structural variants (SVs) although they play a crucial role in many diseases. We present a mechanism-agnostic machine learning-based workflow, called SVFX, to assign pathogenicity scores to somatic and germline SVs. In particular, we generate somatic and germline training models, which include genomic, epigenomic, and conservation-based features, for SV call sets in diseased and healthy individuals. We then apply SVFX to SVs in cancer and other diseases; SVFX achieves high accuracy in identifying pathogenic SVs. Predicted pathogenic SVs in cancer cohorts are enriched among known cancer genes and many cancer-related pathways.

Publication types

  • Research Support, N.I.H., Extramural
  • Research Support, Non-U.S. Gov't

MeSH terms

  • Epigenomics
  • Genome, Human
  • Genomic Structural Variation*
  • Genomics
  • Humans
  • Machine Learning*
  • Oncogenes / genetics
  • Virulence