Protein structure-based gene expression signatures

Proc Natl Acad Sci U S A. 2021 May 11;118(19):e2014866118. doi: 10.1073/pnas.2014866118.


Gene expression signatures (GES) connect phenotypes to differential messenger RNA (mRNA) expression of genes, providing a powerful approach to define cellular identity, function, and the effects of perturbations. The use of GES has suffered from vague assessment criteria and limited reproducibility. Because the structure of proteins defines the functional capability of genes, we hypothesized that enrichment of structural features could be a generalizable representation of gene sets. We derive structural gene expression signatures (sGES) using features from multiple levels of protein structure (e.g., domain and fold) encoded by the mRNAs in GES. Comprehensive analyses of data from the Genotype-Tissue Expression Project (GTEx), the all RNA-seq and ChIP-seq sample and signature search (ARCHS4) database, and mRNA expression of drug effects on cardiomyocytes show that sGES are useful for characterizing biological phenomena. sGES enable phenotypic characterization across experimental platforms, facilitates interoperability of expression datasets, and describe drug action on cells.

Keywords: gene expression signatures; reproducibility; structural bioinformatics.

Publication types

  • Research Support, N.I.H., Extramural

MeSH terms

  • Cell Line
  • Chromatin Immunoprecipitation Sequencing
  • Computational Biology
  • Gene Expression
  • Gene Expression Profiling
  • Humans
  • Myocytes, Cardiac
  • Protein Conformation*
  • Proteins / chemistry*
  • Proteins / genetics*
  • RNA, Messenger
  • RNA-Seq
  • Reproducibility of Results
  • Transcriptome*


  • Proteins
  • RNA, Messenger