Identification of insertion hot spots for non-LTR retrotransposons: computational and biochemical application to Entamoeba histolytica

Nucleic Acids Res. 2006;34(20):5752-63. doi: 10.1093/nar/gkl710. Epub 2006 Oct 13.

Abstract

The genome of the human pathogen Entamoeba histolytica contains non-long terminal repeat (LTR) retrotransposons, the EhLINEs and EhSINEs, which lack targeted insertion. We investigated the importance of local DNA structure, and sequence preference of the element-encoded endonuclease (EN) in selecting target sites for retrotransposon insertion. Pre-insertion loci were tested computationally to detect unique features based on DNA structure, thermodynamic considerations and protein interaction measures. Target sites could readily be distinguished from other genomic sites based on these criteria. The contribution of the EhLINE1-encoded EN in target site selection was investigated biochemically. The sequence-specificity of the EN was tested in vitro with a variety of mutated substrates. It was possible to assign a consensus sequence, 5'-GCATT-3', which was efficiently nicked between A-T and T-T. The upstream G residue enhanced EN activity, possibly serving to limit retrotransposition in the A+T-rich E.histolytica genome. Mutated substrates with poor EN activity showed structural differences compared with normal substrates. Analysis of retrotransposon insertion sites from a variety of organisms showed that, in general, regions of favorable DNA structure were recognized for retrotransposition. A combination of favorable DNA structure and preferred EN nicking sequence in the vicinity of this structure may determine the genomic hotspots for retrotransposition.

Publication types

  • Research Support, Non-U.S. Gov't

MeSH terms

  • Animals
  • Base Sequence
  • Computational Biology
  • Consensus Sequence
  • DNA Mutational Analysis
  • DNA, Protozoan / chemistry
  • Endodeoxyribonucleases / metabolism
  • Entamoeba histolytica / genetics*
  • Long Interspersed Nucleotide Elements*
  • Molecular Sequence Data
  • Short Interspersed Nucleotide Elements*
  • Substrate Specificity

Substances

  • DNA, Protozoan
  • Endodeoxyribonucleases