A statistical model for investigating binding probabilities of DNA nucleotide sequences using microarrays

Biometrics. 2002 Dec;58(4):981-8. doi: 10.1111/j.0006-341x.2002.00981.x.


There is considerable scientific interest in knowing the probability that a site-specific transcription factor will bind to a given DNA sequence. Microarray methods provide an effective means for assessing the binding affinities of a large number of DNA sequences as demonstrated by Bulyk et al. (2001, Proceedings of the National Academy of Sciences, USA 98, 7158-7163) in their study of the DNA-binding specificities of Zif268 zinc fingers using microarray technology. In a follow-up investigation, Bulyk, Johnson, and Church (2002, Nucleic Acid Research 30, 1255-1261) studied the interdependence of nucleotides on the binding affinities of transcription proteins. Our article is motivated by this pair of studies. We present a general statistical methodology for analyzing microarray intensity measurements reflecting DNA-protein interactions. The log probability of a protein binding to a DNA sequence on an array is modeled using a linear ANOVA model. This model is convenient because it employs familiar statistical concepts and procedures and also because it is effective for investigating the probability structure of the binding mechanism.

Publication types

  • Research Support, Non-U.S. Gov't
  • Research Support, U.S. Gov't, Non-P.H.S.
  • Research Support, U.S. Gov't, P.H.S.

MeSH terms

  • Analysis of Variance
  • Binding Sites / genetics
  • DNA-Binding Proteins / genetics
  • Models, Biological*
  • Models, Statistical*
  • Nucleic Acid Conformation
  • Oligonucleotide Array Sequence Analysis / methods*
  • Protein Binding / genetics
  • Regulatory Sequences, Nucleic Acid / genetics
  • Transcription Factors / genetics


  • DNA-Binding Proteins
  • Transcription Factors