Review: High-performance computing to detect epistasis in genome scale data sets

Brief Bioinform. 2016 May;17(3):368-79. doi: 10.1093/bib/bbv058. Epub 2015 Aug 13.


It is becoming clear that most human diseases have a complex etiology that cannot be explained by single nucleotide polymorphisms (SNPs) or simple additive combinations; the general consensus is that they are caused by combinations of multiple genetic variations. The limited success of some genome-wide association studies is partly a result of this focus on single genetic markers. A more promising approach is to take into account epistasis, by considering the association of multiple SNP interactions with disease. However, as genomic data continues to grow in resolution, and genome and exome sequencing become more established, the number of combinations of variants to consider increases rapidly. Two potential solutions should be considered: the use of high-performance computing, which allows us to consider a larger number of variables, and heuristics to make the solution more tractable, essential in the case of genome sequencing. In this review, we look at different computational methods to analyse epistatic interactions within disease-related genetic data sets created by microarray technology. We also review efforts to use epistatic analysis results to produce biomarkers for diagnostic tests and give our views on future directions in this field in light of advances in sequencing technology and variants in non-coding regions.

Keywords: SNP-interactions; biomarker; disease marker; epistasis; genome sequencing; genotyping; high-performance computing.

Publication types

  • Review

MeSH terms

  • Algorithms
  • Epistasis, Genetic
  • Genome*
  • Genome-Wide Association Study
  • Humans
  • Polymorphism, Single Nucleotide