Blood-based multi-tissue gene expression inference with Bayesian ridge regression

Bioinformatics. 2020 Jun 1;36(12):3788-3794. doi: 10.1093/bioinformatics/btaa239.

Abstract

Motivation: Gene expression profiling is widely used in basic and cancer research but still not feasible in many clinical applications because tissues, such as brain samples, are difficult and not ethnical to collect. Gene expression in uncollected tissues can be computationally inferred using genotype and expression quantitative trait loci. No methods can infer unmeasured gene expression of multiple tissues with single tissue gene expression profile as input.

Results: Here, we present a Bayesian ridge regression-based method (B-GEX) to infer gene expression profiles of multiple tissues from blood gene expression profile. For each gene in a tissue, a low-dimensional feature vector was extracted from whole blood gene expression profile by feature selection. We used GTEx RNAseq data of 16 tissues to train inference models to capture the cross-tissue expression correlations between each target gene in a tissue and its preselected feature genes in peripheral blood. We compared B-GEX with least square regression, LASSO regression and ridge regression. B-GEX outperforms the other three models in most tissues in terms of mean absolute error, Pearson correlation coefficient and root-mean-squared error. Moreover, B-GEX infers expression level of tissue-specific genes as well as those of non-tissue-specific genes in all tissues. Unlike previous methods, which require genomic features or gene expression profiles of multiple tissues, our model only requires whole blood expression profile as input. B-GEX helps gain insights into gene expressions of uncollected tissues from more accessible data of blood.

Availability and implementation: B-GEX is available at https://github.com/xuwenjian85/B-GEX.

Supplementary information: Supplementary data are available at Bioinformatics online.

Publication types

  • Research Support, Non-U.S. Gov't

MeSH terms

  • Bayes Theorem
  • Gene Expression Profiling*
  • Genomics
  • Quantitative Trait Loci*
  • Transcriptome