IDDomainSpotter: Compositional bias reveals domains in long disordered protein regions-Insights from transcription factors

Peter S Millard; Katrine Bugge; Riccardo Marabini; Wouter Boomsma; Meike Burow; Birthe B Kragelund

doi:10.1002/pro.3754

IDDomainSpotter: Compositional bias reveals domains in long disordered protein regions-Insights from transcription factors

Protein Sci. 2020 Jan;29(1):169-183. doi: 10.1002/pro.3754. Epub 2019 Nov 11.

Authors

Peter S Millard^{1

2}, Katrine Bugge³, Riccardo Marabini³, Wouter Boomsma⁴, Meike Burow^{1

2}, Birthe B Kragelund³

Affiliations

¹ DynaMo Center, Department of Plant and Environmental Sciences, University of Copenhagen, Copenhagen, Denmark.
² Copenhagen Plant Science Centre, Department of Plant and Environmental Sciences, University of Copenhagen, Copenhagen, Denmark.
³ Structural Biology and NMR Laboratory, Department of Biology, University of Copenhagen, Copenhagen, Denmark.
⁴ Department of Computer Science, University of Copenhagen, Copenhagen, Denmark.

Abstract

Protein domains constitute regions of distinct structural properties and molecular functions that are retained when removed from the rest of the protein. However, due to the lack of tertiary structure, the identification of domains has been largely neglected for long (>50 residues) intrinsically disordered regions. Here we present a sequence-based approach to assess and visualize domain organization in long intrinsically disordered regions based on compositional sequence biases. An online tool to find putative intrinsically disordered domains (IDDomainSpotter) in any protein sequence or sequence alignment using any particular sequence trait is available at http://www.bio.ku.dk/sbinlab/IDDomainSpotter. Using this tool, we have identified a putative domain enriched in hydrophilic and disorder-promoting residues (Pro, Ser, and Thr) and depleted in positive charges (Arg and Lys) bordering the folded DNA-binding domains of several transcription factors (p53, GCR, NAC46, MYB28, and MYB29). This domain, from two different MYB transcription factors, was characterized biophysically to determine its properties. Our analyses show the domain to be extended, dynamic and highly disordered. It connects the DNA-binding domain to other disordered domains and is present and conserved in several transcription factors from different families and domains of life. This example illustrates the potential of IDDomainSpotter to predict, from sequence alone, putative domains of functional interest in otherwise uncharacterized disordered proteins.

Keywords: DNA-binding domain; IDDomainSpotter; IDPs; NMR; compositional bias; domain; low-complexity regions; p53; plant MYB protein; transactivation domain; transcription factor.

Publication types

Research Support, Non-U.S. Gov't

MeSH terms

Amino Acid Sequence
Arabidopsis / chemistry*
Arabidopsis / genetics*
Arabidopsis / metabolism
Arabidopsis Proteins / chemistry*
Arabidopsis Proteins / genetics*
Arabidopsis Proteins / metabolism
Bias
Binding Sites
Histone Acetyltransferases
Humans
Models, Molecular
Protein Binding
Protein Domains
Protein Unfolding
Scattering, Small Angle
Transcription Factors / chemistry*
Transcription Factors / genetics*
Transcription Factors / metabolism
X-Ray Diffraction

Substances

Arabidopsis Proteins
Myb29 protein, Arabidopsis
Transcription Factors
GCN5 protein, Arabidopsis
Histone Acetyltransferases