A statistical model for locating regulatory regions in genomic DNA

J Mol Biol. 1997 Apr 25;268(1):8-14. doi: 10.1006/jmbi.1997.0965.

Abstract

In addition to genes, chromosomal DNA contains sequences that serve as signals for turning on and off gene expression. These signals are thought to be distributed as clusters in the regulatory regions of genes. We develop a Bayesian model that views locating regulatory regions in genomic DNA as a change-point problem, with the beginning of regulatory and non-regulatory regions corresponding to the change points. The model is based on a hidden Markov chain. The data consist of nucleotide positions of protein-binding elements in a genomic DNA sequence. These positions are identified using a reference catalogue containing elements that interact with transcription factors implicated in controlling the expression of protein-encoding genes. Among the protein-binding elements in a genomic DNA sequence, the statistical model automatically selects those that tend to predict regulatory regions. We test the model using viral sequences that include known regulatory regions and provide the results obtained for human genomic DNA corresponding to the beta globin locus on chromosome 11.

Publication types

  • Comparative Study
  • Research Support, U.S. Gov't, Non-P.H.S.
  • Research Support, U.S. Gov't, P.H.S.

MeSH terms

  • Adenoviridae / genetics
  • Algorithms
  • Chromosome Mapping / methods*
  • Chromosomes, Human, Pair 11
  • DNA, Viral
  • Genome*
  • Genome, Viral
  • Globins / genetics
  • HIV-1 / genetics
  • Humans
  • Markov Chains
  • Models, Genetic*
  • Models, Statistical
  • Molecular Sequence Data
  • Regulatory Sequences, Nucleic Acid*
  • Simian virus 40 / genetics

Substances

  • DNA, Viral
  • Globins

Associated data

  • GENBANK/U01317