Molecular evolution of the GATA family of transcription factors: conservation within the DNA-binding domain

J Mol Evol. 2000 Feb;50(2):103-15. doi: 10.1007/s002399910012.


The GATA-binding transcription factors comprise a protein family whose members contain either one or two highly conserved zinc finger DNA-binding domains. Members of this group have been identified in organisms ranging from cellular slime mold to vertebrates, including plants, fungi, nematodes, insects, and echinoderms. While much work has been done describing the expression patterns, functional aspects, and target genes for many of these proteins, an evolutionary analysis of the entire family has been lacking. Herein we show that only the C-terminal zinc finger (Cf) and basic domain, which together constitute the GATA-binding domain, are conserved throughout this protein family. Phylogenetic analyses of amino acid sequences demonstrate distinct evolutionary pathways. Analysis of GATA factors isolated from vertebrates suggests that the six distinct vertebrate GATAs are descended from a common ancestral sequence, while those isolated from nonvertebrates (with the exception of the fungal AREA orthologues and Arabidopsis paralogues) appear to be related only within the DNA-binding domain and otherwise provide little insight into their evolutionary history. These results suggest multiple modes of evolution, including gene duplication and modular evolution of GATA factors based upon inclusion of a class IV zinc finger motif. As such, GATA transcription factors represent a group of proteins related solely by their homologous DNA-binding domains. Further analysis of this domain examines the degree of conservation at each amino acid site using the Boltzmann entropy measure, thereby identifying residues critical to preservation of structure and function. Finally, we construct a predictive motif that can accurately identify potential GATA proteins.

Publication types

  • Research Support, U.S. Gov't, Non-P.H.S.
  • Research Support, U.S. Gov't, P.H.S.

MeSH terms

  • Amino Acid Motifs
  • Amino Acid Sequence
  • Binding Sites
  • Conserved Sequence
  • DNA-Binding Proteins / physiology*
  • Erythroid-Specific DNA-Binding Factors
  • Evolution, Molecular*
  • Models, Molecular
  • Molecular Sequence Data
  • Phylogeny
  • Protein Structure, Tertiary
  • Transcription Factors / physiology*
  • Zinc Fingers


  • DNA-Binding Proteins
  • Erythroid-Specific DNA-Binding Factors
  • Transcription Factors