Background: micF RNA, a small regulatory RNA found in bacteria, post-transcriptionally regulates expression of outer membrane protein F (OmpF) by interaction with the ompF mRNA 5'UTR. Phylogenetic data can be useful for RNA/RNA duplex structure analyses and aid in elucidation of mechanism of regulation. However micF and associated genes, ompF and ompC are difficult to annotate because of either similarities or divergences in nucleotide sequence. We report by using sequences that represent "gene signatures" as probes, e.g., mRNA 5'UTR sequences, closely related genes can be accurately located in genomic sequences.
Results: Alignment and search methods using NCBI BLAST programs have been used to identify micF, ompF and ompC in Yersinia pestis and Yersinia enterocolitica. By alignment with DNA sequences from other bacterial species, 5' start sites of genes and upstream transcriptional regulatory sites in promoter regions were predicted. Annotated genes from Yersinia species provide phylogenetic information on the micF regulatory system. High sequence conservation in binding sites of transcriptional regulatory factors are found in the promoter region upstream of micF and conservation in blocks of sequences as well as marked sequence variation is seen in segments of the micF RNA gene. Unexpected large differences in rates of evolution were found between the interacting RNA transcripts, micF RNA and the 5' UTR of the ompF mRNA. micF RNA/ompF mRNA 5' UTR duplex structures were modeled by the mfold program. Functional domains such as RNA/RNA interacting sites appear to display a minimum of evolutionary drift in sequence with the exception of a significant change in Y. enterocolitica micF RNA.
Conclusions: Newly annotated Yersinia micF and ompF genes and the resultant RNA/RNA duplex structures add strong phylogenetic support for a generalized duplex model. The alignment and search approach using 5' UTR signatures may be a model to help define other genes and their start sites when annotated genes are available in well-defined reference organisms.