Total sequence decomposition distinguishes functional modules, "molegos" in apurinic/apyrimidinic endonucleases

BMC Bioinformatics. 2002 Nov 25;3:37. doi: 10.1186/1471-2105-3-37. Epub 2002 Nov 25.


Background: Total sequence decomposition, using the web-based MASIA tool, identifies areas of conservation in aligned protein sequences. By structurally annotating these motifs, the sequence can be parsed into individual building blocks, molecular legos ("molegos"), that can eventually be related to function. Here, the approach is applied to the apurinic/apyrimidinic endonuclease (APE) DNA repair proteins, essential enzymes that have been highly conserved throughout evolution. The APEs, DNase-1 and inositol 5'-polyphosphate phosphatases (IPP) form a superfamily that catalyze metal ion based phosphorolysis, but recognize different substrates.

Results: MASIA decomposition of APE yielded 12 sequence motifs, 10 of which are also structurally conserved within the family and are designated as molegos. The 12 motifs include all the residues known to be essential for DNA cleavage by APE. Five of these molegos are sequentially and structurally conserved in DNase-1 and the IPP family. Correcting the sequence alignment to match the residues at the ends of two of the molegos that are absolutely conserved in each of the three families greatly improved the local structural alignment of APEs, DNase-1 and synaptojanin. Comparing substrate/product binding of molegos common to DNase-1 showed that those distinctive for APEs are not directly involved in cleavage, but establish protein-DNA interactions 3' to the abasic site. These additional bonds enhance both specific binding to damaged DNA and the processivity of APE1.

Conclusion: A modular approach can improve structurally predictive alignments of homologous proteins with low sequence identity and reveal residues peripheral to the traditional "active site" that control the specificity of enzymatic activity.

Publication types

  • Research Support, Non-U.S. Gov't
  • Research Support, U.S. Gov't, Non-P.H.S.

MeSH terms

  • Amino Acid Motifs
  • Amino Acid Sequence
  • Binding Sites / physiology
  • Conserved Sequence
  • DNA-(Apurinic or Apyrimidinic Site) Lyase / chemistry*
  • DNA-(Apurinic or Apyrimidinic Site) Lyase / physiology*
  • Deoxyribonuclease I / chemistry
  • Deoxyribonuclease I / physiology
  • Humans
  • Models, Molecular
  • Molecular Sequence Data
  • Nerve Tissue Proteins / chemistry
  • Nerve Tissue Proteins / physiology
  • Phosphoric Monoester Hydrolases / chemistry
  • Phosphoric Monoester Hydrolases / physiology
  • Protein Structure, Tertiary / physiology
  • Sequence Alignment / methods
  • Software


  • Nerve Tissue Proteins
  • Deoxyribonuclease I
  • synaptojanin
  • Phosphoric Monoester Hydrolases
  • DNA-(Apurinic or Apyrimidinic Site) Lyase