Building an optimal predictive model for imputing tissue-specific gene expression by combining genotype and whole-blood transcriptome data

HGG Adv. 2023 Jul 11;4(4):100223. doi: 10.1016/j.xhgg.2023.100223. eCollection 2023 Oct 12.

Abstract

Accurate imputation of tissue-specific gene expression can be a powerful tool for understanding the biological mechanisms underlying human complex traits. Existing imputation methods can be grouped into two categories according to the types of predictors used. The first category uses genotype data, while the second category uses whole-blood expression data. Both data types can be easily collected from blood, avoiding invasive tissue biopsies. In this study, we attempted to build an optimal predictive model for imputing tissue-specific gene expression by combining the genotype and whole-blood expression data. We first evaluated the imputation performance of each standalone model (using genotype data [GEN model] and using whole-blood expression data [WBE model]) using their respective data types across 47 human tissues. The WBE model outperformed the GEN model in most tissues by a large gain. Then, we developed several combined models that leverage both types of predictors to further improve imputation performance. We tried various strategies, including utilizing a merged dataset of the two data types (MERGED models) and integrating the imputation outcomes of the two standalone models (inverse variance-weighted [IVW] models). We found that one of the MERGED models noticeably outperformed the standalone models. This model involved a fixed ratio between the two regularization penalty factors for the two predictor types so that the contribution of the whole-blood transcriptome is upweighted compared with the genotype. Our study suggests that one can improve the imputation of tissue-specific gene expression by combining the genotype and whole-blood expression, but the improvement can be largely dependent on the combination strategy chosen.

Keywords: GTEx; TWAS; Transcriptome-wide association studies; eQTLs; expression quantitative trait loci; regularized linear regression; transcriptome imputation; whole blood expression.

Publication types

  • Research Support, Non-U.S. Gov't

MeSH terms

  • Genome-Wide Association Study* / methods
  • Genotype
  • Humans
  • Phenotype
  • Polymorphism, Single Nucleotide
  • Quantitative Trait Loci
  • Transcriptome* / genetics