Drug-induced liver injury (DILI) is a major cause of drug development failure and drug withdrawal from the market after approval. The identification of human risk factors associated with susceptibility to DILI is of paramount importance. Increasing evidence suggests that genetic variants may lead to inter-individual differences in drug response; however, individual single-nucleotide polymorphisms (SNPs) usually have limited power to predict human phenotypes such as DILI. In this study, we aim to identify appropriate statistical methods to investigate gene-gene and/or gene-environment interactions that impact DILI susceptibility. Three machine learning approaches, including Multivariate Adaptive Regression Splines (MARS), Multifactor Dimensionality Reduction (MDR), and logistic regression, were used. The simulation study suggested that all three methods were robust and could identify the known SNP-SNP interaction when up to 4% of genotypes were randomly permutated. When applied to a real-life DILI chronicity dataset, both MARS and MDR, but not logistic regression, identified combined genetic variants having better associations with DILI chronicity in comparison to the use of individual SNPs. Furthermore, a simple decision tree model using the SNPs identified by MARS and MDR was developed to predict DILI chronicity, with fair performance. Our study suggests that machine learning approaches may help identify gene-gene interactions as potential risk factors for better assessing complicated diseases such as DILI chronicity.
Keywords: SNP; chronicity; drug-induced liver injury; epistasis; gene–gene interactions; genotype; machine learning; splines.