Background: Upstream open reading frames (uORFs) can mediate translational control over the largest, or major ORF (mORF) in response to starvation, polyamine concentrations, and sucrose concentrations. One plant uORF with conserved peptide sequences has been shown to exert this control in an amino acid sequence-dependent manner but generally it is not clear what kinds of genes are regulated, or how extensively this mechanism is invoked in a given genome.
Results: By comparing full-length cDNA sequences from Arabidopsis and rice we identified 26 distinct homology groups of conserved peptide uORFs, only three of which have been reported previously. Pairwise Ka/Ks analysis showed that purifying selection had acted on nearly all conserved peptide uORFs and their associated mORFs. Functions of predicted mORF proteins could be inferred for 16 homology groups and many of these proteins appear to have a regulatory function, including 6 transcription factors, 5 signal transduction factors, 3 developmental signal molecules, a homolog of translation initiation factor eIF5, and a RING finger protein. Transcription factors are clearly overrepresented in this data set when compared to the frequency calculated for the entire genome (p = 1.2 x 10(-7)). Duplicate gene pairs arising from a whole genome duplication (ohnologs) with a conserved uORF are much more likely to have been retained in Arabidopsis (Arabidopsis thaliana) than are ohnologs of other genes (39% vs 14% of ancestral genes, p = 5 x 10(-3)). Two uORF groups were found in animals, indicating an ancient origin of these putative regulatory elements.
Conclusion: Conservation of uORF amino acid sequence, association with homologous mORFs over long evolutionary time periods, preferential retention after whole genome duplications, and preferential association with mORFs coding for transcription factors suggest that the conserved peptide uORFs identified in this study are strong candidates for translational controllers of regulatory genes.