Tradict enables accurate prediction of eukaryotic transcriptional states from 100 marker genes

Nat Commun. 2017 May 5:8:15309. doi: 10.1038/ncomms15309.

Abstract

Transcript levels are a critical determinant of the proteome and hence cellular function. Because the transcriptome is an outcome of the interactions between genes and their products, it may be accurately represented by a subset of transcript abundances. We develop a method, Tradict (transcriptome predict), capable of learning and using the expression measurements of a small subset of 100 marker genes to predict transcriptome-wide gene abundances and the expression of a comprehensive, but interpretable list of transcriptional programs that represent the major biological processes and pathways of the cell. By analyzing over 23,000 publicly available RNA-Seq data sets, we show that Tradict is robust to noise and accurate. Coupled with targeted RNA sequencing, Tradict may therefore enable simultaneous transcriptome-wide screening and mechanistic investigation at large scales.

Publication types

  • Research Support, Non-U.S. Gov't

MeSH terms

  • Algorithms*
  • Animals
  • Arabidopsis / genetics
  • Arabidopsis / immunology
  • Computational Biology / methods*
  • Eukaryota / genetics*
  • Humans
  • Immunity, Innate / genetics
  • Signal Transduction
  • Transcription, Genetic*
  • Transcriptome / genetics