Sequence conservation in the protein coding and intron regions of the engrailed transcription unit

EMBO J. 1986 Dec 20;5(13):3583-9.

Abstract

Engrailed (en) is a gene involved in proper segmentation of the Drosophila embryo. The predicted en protein contains a homeodomain and regions rich in polyalanine, polyglutamine, polyglutamate/aspartate and serine. We have taken an evolutionary approach to define which regions may be of fundamental importance by examining the D. virilis genomic sequence homologous to the D. melanogaster en primary transcription unit. Sequence homology begins at the first ATG of a long open reading frame yielding proteins of 584 and 552 amino acids for the D. virilis and D. melanogaster proteins, respectively. The predicted amino acid sequence can be divided into conserved and non-conserved domains. The C-terminal 30% of the protein (which includes the homeodomain) is completely conserved. In the N-terminal 70% of the protein, the overall conservation is 71%, but non-conservative amino acid changes occur in clusters and there are short stretches of highly conserved sequence. A region rich in glutamate and aspartate is conserved and has homology to an 18-amino acid sequence present in members of the myc family of proteins. Major differences in the size of the two proteins occur in regions of non-conserved repeated sequences. In the introns of the engrailed transcription units there are long stretches of conservation, suggesting this DNA may be of functional importance.

Publication types

  • Comparative Study
  • Research Support, U.S. Gov't, Non-P.H.S.
  • Research Support, U.S. Gov't, P.H.S.

MeSH terms

  • Amino Acid Sequence
  • Animals
  • Base Sequence
  • Cloning, Molecular
  • Drosophila / genetics*
  • Drosophila melanogaster / genetics*
  • Genes*
  • Genes, Homeobox*
  • Introns*
  • Proteins / genetics
  • Sequence Homology, Nucleic Acid
  • Species Specificity
  • Transcription, Genetic*

Substances

  • Proteins

Associated data

  • GENBANK/X04727