The Arabidopsis thaliana genome contains at least 29 active genes encoding SET domain proteins that can be assigned to four evolutionarily conserved classes

Nucleic Acids Res. 2001 Nov 1;29(21):4319-33. doi: 10.1093/nar/29.21.4319.

Abstract

SET domains are conserved amino acid motifs present in chromosomal proteins that function in epigenetic control of gene expression. These proteins can be divided into four classes as typified by their Drosophila members E(Z), TRX, ASH1 and SU(VAR)3-9. Homologs of all four classes have been identified in yeast and mammals, but not in plants. A BLASTP screening of the Arabidopsis genome identified 37 genes: three E(z) homologs, five trx homologs, four ash1 homologs and 15 genes similar to Su(var)3-9. Seven genes were assigned as trx-related and three as ash1-related. Only four genes have been described previously. Our classification is based on the characteristics of the SET domains, cysteine-rich regions and additional conserved domains, including a novel YGD domain. RT-PCR analysis, cDNA cloning and matching ESTs show that at least 29 of the genes are active in diverse tissues. The high number of SET domain genes, possibly involved in epigenetic control of gene activity during plant development, can partly be explained by extensive genome duplication in Arabidopsis. Additionally, the lack of introns in the coding region of eight SU(VAR)3-9 class genes indicates evolution of new genes by retrotransposition. The identification of putative nuclear localization signals and AT-hooks in many of the proteins supports an anticipated nuclear localization, which was demonstrated for selected proteins.

Publication types

  • Research Support, Non-U.S. Gov't

MeSH terms

  • Active Transport, Cell Nucleus
  • Amino Acid Motifs
  • Amino Acid Sequence
  • Arabidopsis / chemistry
  • Arabidopsis / genetics*
  • Arabidopsis / growth & development
  • Arabidopsis Proteins / chemistry*
  • Arabidopsis Proteins / classification
  • Arabidopsis Proteins / genetics*
  • Conserved Sequence*
  • Cysteine / metabolism
  • Databases, Protein
  • Evolution, Molecular*
  • Gene Duplication
  • Gene Expression Profiling
  • Gene Expression Regulation, Plant*
  • Genes, Duplicate / genetics
  • Genes, Plant / genetics*
  • Genome, Plant*
  • Histone-Lysine N-Methyltransferase / chemistry*
  • Histone-Lysine N-Methyltransferase / classification
  • Histone-Lysine N-Methyltransferase / genetics*
  • Introns / genetics
  • Molecular Sequence Data
  • Nuclear Localization Signals
  • Open Reading Frames / genetics
  • Protein Binding
  • Protein Structure, Tertiary
  • RNA, Messenger / genetics
  • RNA, Messenger / metabolism
  • RNA, Plant / genetics
  • RNA, Plant / metabolism
  • Retroelements / genetics
  • Sequence Alignment

Substances

  • Arabidopsis Proteins
  • Nuclear Localization Signals
  • RNA, Messenger
  • RNA, Plant
  • Retroelements
  • SUVR1 protein, Arabidopsis
  • SUVR2 protein, Arabidopsis
  • Histone-Lysine N-Methyltransferase
  • SUVR4 protein, Arabidopsis
  • Cysteine

Associated data

  • GENBANK/AF344444
  • GENBANK/AF344445
  • GENBANK/AF344446
  • GENBANK/AF344447
  • GENBANK/AF344448
  • GENBANK/AF344449
  • GENBANK/AF344450
  • GENBANK/AF344451
  • GENBANK/AF344452
  • GENBANK/AF394239
  • GENBANK/AY045576