Integration of single nucleotide variants and whole-genome DNA methylation profiles for classification of rheumatoid arthritis cases from controls

Heredity (Edinb). 2020 May;124(5):658-674. doi: 10.1038/s41437-020-0301-4. Epub 2020 Mar 3.


This study evaluated the use of multiomics data for classification accuracy of rheumatoid arthritis (RA). Three approaches were used and compared in terms of prediction accuracy: (1) whole-genome prediction (WGP) using SNP marker information only, (2) whole-methylome prediction (WMP) using methylation profiles only, and (3) whole-genome/methylome prediction (WGMP) with combining both omics layers. The number of SNP and of methylation sites varied in each scenario, with either 1, 10, or 50% of these preselected based on four approaches: randomly, evenly spaced, lowest p value (genome-wide association or epigenome-wide association study), and estimated effect size using a Bayesian ridge regression (BRR) model. To remove effects of high levels of pairwise linkage disequilibrium (LD), SNPs were also preselected with an LD-pruning method. Five Bayesian regression models were studied for classification, including BRR, Bayes-A, Bayes-B, Bayes-C, and the Bayesian LASSO. Adjusting methylation profiles for cellular heterogeneity within whole blood samples had a detrimental effect on the classification ability of the models. Overall, WGMP using Bayes-B model has the best performance. In particular, selecting SNPs based on LD-pruning with 1% of the methylation sites selected based on BRR included in the model, and fitting the most significant SNP as a fixed effect was the best method for predicting disease risk with a classification accuracy of 0.975. Our results showed that multiomics data can be used to effectively predict the risk of RA and identify cases in early stages to prevent or alter disease progression via appropriate interventions.