Skip to main page content
Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
, 32 (12), 1832-9

Gene Expression Inference With Deep Learning

Affiliations

Gene Expression Inference With Deep Learning

Yifei Chen et al. Bioinformatics.

Abstract

Motivation: Large-scale gene expression profiling has been widely used to characterize cellular states in response to various disease conditions, genetic perturbations, etc. Although the cost of whole-genome expression profiles has been dropping steadily, generating a compendium of expression profiling over thousands of samples is still very expensive. Recognizing that gene expressions are often highly correlated, researchers from the NIH LINCS program have developed a cost-effective strategy of profiling only ∼1000 carefully selected landmark genes and relying on computational methods to infer the expression of remaining target genes. However, the computational approach adopted by the LINCS program is currently based on linear regression (LR), limiting its accuracy since it does not capture complex nonlinear relationship between expressions of genes.

Results: We present a deep learning method (abbreviated as D-GEX) to infer the expression of target genes from the expression of landmark genes. We used the microarray-based Gene Expression Omnibus dataset, consisting of 111K expression profiles, to train our model and compare its performance to those from other methods. In terms of mean absolute error averaged across all genes, deep learning significantly outperforms LR with 15.33% relative improvement. A gene-wise comparative analysis shows that deep learning achieves lower error than LR in 99.97% of the target genes. We also tested the performance of our learned model on an independent RNA-Seq-based GTEx dataset, which consists of 2921 expression profiles. Deep learning still outperforms LR with 6.57% relative improvement, and achieves lower error in 81.31% of the target genes.

Availability and implementation: D-GEX is available at https://github.com/uci-cbcl/D-GEX CONTACT: xhx@ics.uci.edu

Supplementary information: Supplementary data are available at Bioinformatics online.

Figures

Fig. 1.
Fig. 1.
The overall errors of D-GEX-10% with different architectures on GEO-te. The performance of LR is also included for comparison
Fig. 2.
Fig. 2.
The density plots of the predictive errors of all the target genes by LR, KNN-GE and GEX-10%-9000 × 3 on GEO-te
Fig. 3.
Fig. 3.
The predictive errors of each target gene by GEX-10%-9000 × 3 compared with LR and KNN-GE on GEO-te. Each dot represents one out of the 9520 target genes. The x-axis is the MAE of each target gene by D-GEX, and the y-axis is the MAE of each target gene by the other method. Dots above diagonal means D-GEX achieves lower error compared with the other method. (a) D-GEX verse LR; (b) D-GEX verse KNN-GE
Fig. 4.
Fig. 4.
The predictive errors of each target gene by GEX-25%-9000 × 2 compared with LR and KNN-GE on GTEx-te. Each dot represents one out of the 9520 target genes. The x-axis is the MAE of each target gene by D-GEX, and the y-axis is the MAE of each target gene by the other method. Dots above diagonal means D-GEX achieves lower error compared with the other method. (a) D-GEX versus LR; (b) D-GEX versus KNN-GE
Fig. 5.
Fig. 5.
The overall error decreasing curves of D-GEX-9000 × 2 on GTEx-te with different dropout rates. The x-axis is the training epoch and the y-axis is the overall error. The overall error of LR is also included for comparison

Similar articles

See all similar articles

Cited by 45 articles

See all "Cited by" articles
Feedback