AlignACE is a Gibbs sampling algorithm for identifying motifs that are over-represented in a set of DNA sequences. When used to search upstream of apparently coregulated genes, AlignACE finds motifs that often correspond to the DNA binding preferences of transcription factors. We previously used AlignACE to analyze whole genome mRNA expression data. Here, we present a more detailed study of its effectiveness as applied to a variety of groups of genes in the Saccharomyces cerevisiae genome. Published functional catalogs of genes and sets of genes grouped by common name provided 248 groups, resulting in 3311 motifs. In conjunction with this analysis, we present measures for gauging the tendency of a motif to target a given set of genes relative to all other genes in the genome and for gauging the degree to which a motif is preferentially located in a certain distance range upstream of translational start sites. We demonstrate improved methods for comparing and clustering sequence motifs. Many previously identified cis-regulatory elements were found. We also describe previously unidentified motifs, one of which has been verified by experiments in our laboratory. An extensive set of AlignACE runs on randomly selected sets of genes and on sets of genes whose upstream regions contain known transcription factor binding sites serve as controls.
Copyright 2000 Academic Press.