Bayesian method to predict individual SNP genotypes from gene expression data

Nat Genet. 2012 May;44(5):603-8. doi: 10.1038/ng.2248.


RNA profiling can be used to capture the expression patterns of many genes that are associated with expression quantitative trait loci (eQTLs). Employing published putative cis eQTLs, we developed a Bayesian approach to predict SNP genotypes that is based only on RNA expression data. We show that predicted genotypes can accurately and uniquely identify individuals in large populations. When inferring genotypes from an expression data set using eQTLs of the same tissue type (but from an independent cohort), we were able to resolve 99% of the identities of individuals in the cohort at P(adjusted) ≤ 1 × 10(-5). When eQTLs derived from one tissue were used to predict genotypes using expression data from a different tissue, the identities of 90% of the study subjects could be resolved at P(adjusted) ≤ 1 × 10(-5). We discuss the implications of deriving genotypic information from RNA data deposited in the public domain.

MeSH terms

  • Adipose Tissue / metabolism*
  • Bayes Theorem*
  • Cohort Studies
  • Computer Simulation
  • Gene Expression Profiling*
  • Genotype
  • Humans
  • Liver / metabolism*
  • Polymorphism, Single Nucleotide / genetics*
  • Quantitative Trait Loci*