High-throughput discovery of functional disordered regions: investigation of transactivation domains

Mol Syst Biol. 2018 May 14;14(5):e8190. doi: 10.15252/msb.20188190.

Abstract

Over 40% of proteins in any eukaryotic genome encode intrinsically disordered regions (IDRs) that do not adopt defined tertiary structures. Certain IDRs perform critical functions, but discovering them is non-trivial as the biological context determines their function. We present IDR-Screen, a framework to discover functional IDRs in a high-throughput manner by simultaneously assaying large numbers of DNA sequences that code for short disordered sequences. Functionality-conferring patterns in their protein sequence are inferred through statistical learning. Using yeast HSF1 transcription factor-based assay, we discovered IDRs that function as transactivation domains (TADs) by screening a random sequence library and a designed library consisting of variants of 13 diverse TADs. Using machine learning, we find that segments devoid of positively charged residues but with redundant short sequence patterns of negatively charged and aromatic residues are a generic feature for TAD functionality. We anticipate that investigating defined sequence libraries using IDR-Screen for specific functions can facilitate discovering novel and functional regions of the disordered proteome as well as understand the impact of natural and disease variants in disordered segments.

Keywords: high‐throughput screen; intrinsically disordered protein; machine learning; mutational scanning; transactivation domain.

Publication types

  • Research Support, Non-U.S. Gov't

MeSH terms

  • Cloning, Molecular
  • DNA-Binding Proteins / genetics*
  • Gene Library
  • Heat-Shock Proteins / genetics*
  • High-Throughput Nucleotide Sequencing
  • Machine Learning
  • Proteome / genetics
  • Saccharomyces cerevisiae / genetics*
  • Saccharomyces cerevisiae Proteins / genetics*
  • Sequence Analysis, DNA
  • Transcription Factors / genetics*
  • Transcriptional Activation*

Substances

  • DNA-Binding Proteins
  • HSF1 protein, S cerevisiae
  • Heat-Shock Proteins
  • Proteome
  • Saccharomyces cerevisiae Proteins
  • Transcription Factors