Identification of a conserved sequence in the non-coding regions of many human genes

Nucleic Acids Res. 1989 Jan 25;17(2):699-710. doi: 10.1093/nar/17.2.699.


We have analyzed a sequence of approximately 70 base pairs (bp) that shows a high degree of similarity to sequences present in the non-coding regions of a number of human and other mammalian genes. The sequence was discovered in a fragment of human genomic DNA adjacent to an integrated hepatitis B virus genome in cells derived from human hepatocellular carcinoma tissue. When one of the viral flanking sequences was compared to nucleotide sequences in GenBank, more than thirty human genes were identified that contained a similar sequence in their non-coding regions. The sequence element was usually found once or twice in a gene, either in an intron or in the 5' or 3' flanking regions. It did not share any similarities with known short interspersed nucleotide elements (SINEs) or presently known gene regulatory elements. This element was highly conserved at the same position within the corresponding human and mouse genes for myoglobin and N-myc, indicating evolutionary conservation and possible functional importance. Preliminary DNase I footprinting data suggested that the element or its adjacent sequences may bind nuclear factors to generate specific DNase I hypersensitive sites. The size, structure, and evolutionary conservation of this sequence indicates that it is distinct from other types of short interspersed repetitive elements. It is possible that the element may have a cis-acting functional role in the genome.

Publication types

  • Comparative Study
  • Research Support, U.S. Gov't, P.H.S.

MeSH terms

  • Animals
  • Base Sequence*
  • Carcinoma / genetics*
  • Carcinoma, Hepatocellular / genetics*
  • Cell Line
  • DNA, Neoplasm / isolation & purification
  • DNA, Viral / isolation & purification
  • DNA, Viral / metabolism
  • DNA-Binding Proteins / analysis
  • Genes, Viral
  • Hepatitis B virus / genetics
  • Humans
  • Liver Neoplasms / genetics*
  • Molecular Sequence Data
  • Sequence Homology, Nucleic Acid*


  • DNA, Neoplasm
  • DNA, Viral
  • DNA-Binding Proteins

Associated data

  • GENBANK/X13001