Skip to main page content
Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
Review
. 2015 Sep 10;6:285.
doi: 10.3389/fgene.2015.00285. eCollection 2015.

A Survey About Methods Dedicated to Epistasis Detection

Affiliations
Free PMC article
Review

A Survey About Methods Dedicated to Epistasis Detection

Clément Niel et al. Front Genet. .
Free PMC article

Abstract

During the past decade, findings of genome-wide association studies (GWAS) improved our knowledge and understanding of disease genetics. To date, thousands of SNPs have been associated with diseases and other complex traits. Statistical analysis typically looks for association between a phenotype and a SNP taken individually via single-locus tests. However, geneticists admit this is an oversimplified approach to tackle the complexity of underlying biological mechanisms. Interaction between SNPs, namely epistasis, must be considered. Unfortunately, epistasis detection gives rise to analytic challenges since analyzing every SNP combination is at present impractical at a genome-wide scale. In this review, we will present the main strategies recently proposed to detect epistatic interactions, along with their operating principle. Some of these methods are exhaustive, such as multifactor dimensionality reduction, likelihood ratio-based tests or receiver operating characteristic curve analysis; some are non-exhaustive, such as machine learning techniques (random forests, Bayesian networks) or combinatorial optimization approaches (ant colony optimization, computational evolution system).

Keywords: biological data mining; complex disease; epistasis detection; feature selection; genome-wide association study.

Figures

Figure 1
Figure 1
Toy example of epistasis. (A) Neither SNP 1 nor SNP 2 presents a marginal effect. (B) In gray cells, allele combinations between SNP 1 and SNP 2 induce statistically significant epistatic effect on the phenotype distribution.
Figure 2
Figure 2
Real example of epistasis: S. cerevisiae sporulation is regulated by epistatic effects among three SNPs. State of SNP 1 modulates the production rate of RME1. State of SNP 2 influences the binding specificity of RME1. State of SNP 3 conditions the binding specificity of IME1-kinase.
Figure 3
Figure 3
Representations of GWAS data. (A) Classical representation: cell (i, j) corresponds to status of SNP i for individual j. (B) Binary representation: cell (i, j) corresponds to the true (1) or false (0) assertion that a SNP i has a specific value (0, 1, or 2) for individual j. For ease of comprehension, the link between these two representions is highlighted in gray.
Figure 4
Figure 4
Steps of multifactor dimensionality reduction (MDR) algorithm: example of 2-way interaction model. Description of one iteration of the cross-validation process. In (A), a SNP combination is drawn among all potential SNP combinations. In (B), numbers in red denote counts for cases whereas numbers in black denote counts for controls. In (C), each cell displays the ratio of cases over controls. In (D), the prediction error is estimated over the 10 iterations.
Figure 5
Figure 5
ReliefF algorithm.
Figure 6
Figure 6
Random forest algorithm. (A) Algorithm of a random forest procedure. (B) An example of the three steps needed to grow one tree.
Figure 7
Figure 7
(A) Markov blanket of a phenotype, in gray area. It is made of the parents (SNP 2, SNP 3, and SNP k), of the children (Effect 1 and Effect 2) and of the spouses (Common cause of Effect 1 with respect to the Phenotype). (B) Two stages of Markov blanket learning. For ease of reading, the Markov blanket is reduced to parents from (A).
Figure 8
Figure 8
Ant colony optimization procedure. For each ant, multiple SNPs are drawn. The probability distribution function (PDF) gives the probability of each SNP to be drawn. Once an ant is filled with a SNP set, joint association of this SNP set with the phenotype is evaluated with a χ2 test. For each ant, the PDF is updated according to p-values resulting from χ2 tests, such that SNPs efficiently classifying individuals will have a higher probability of being drawn in the next iteration.

Similar articles

See all similar articles

Cited by 13 articles

See all "Cited by" articles

References

    1. Agresti A. (2002). Categorical Data Analysis, 2nd Edn. Hoboken, NJ: John Wiley & Sons, Inc.
    1. Alekseyenko A. V., Lytkin N. I., Ai J., Ding B., Padyukov L., Aliferis C. F., et al. . (2011). Causal graph-based analysis of genome-wide association data in rheumatoid arthritis. Biol. Direct. 6:25. 10.1186/1745-6150-6-25 - DOI - PMC - PubMed
    1. Aliferis C. F., Statnikov A., Tsamardinos I., Mani S., Koutsoukos X. D. (2010a). Local causal and markov blanket induction for causal discovery and feature selection for classification part I: algorithms and empirical evaluation. J. Mach. Learn. Res. 11, 171–234.
    1. Aliferis C. F., Statnikov A., Tsamardinos I., Mani S., Koutsoukos X. D. (2010b). Local Causal and markov blanket induction for causal discovery and feature selection for classification part II: analysis and extensions. J. Mach. Learn. Res. 11, 235-284.
    1. Bateson W. (1909). Mendel's Principles of Heredity. Cambridge, UK: Cambridge University Press.
Feedback