Motivation: Upstream open reading frames (uORFs) are often found in the 5'-untranslated regions of eukaryotic messenger RNAs. Some uORFs have been shown to encode functional peptides involved in the translational regulation of the downstream main ORFs. Comparative genomic approaches have been used in genome-wide searches for uORFs encoding bioactive peptides, and by comparing uORF sequences between a few selected species or among a small group of species, uORFs with conserved amino acid sequences (UCASs) have been identified in plants, mammals and insects. Regulatory regions within uORF-encoded peptides that are involved in translational control are typically 10-20 amino acids long. Detection of homology between such short regions largely depends on the selection of species for comparison. To maximize the chances of identifying UCASs with short conserved regions, we devised a novel algorithm for homology search among a large number of species and the automatic selection of uORFs conserved in a wide range of species.
Results: In this study, we developed the BAIUCAS (BLAST-based algorithm for identification of UCASs) method and identified 18 novel Arabidopsis uORFs whose amino acid sequences are conserved across diverse eudicot species, which include uORFs not found in previous comparative genomic studies due to low sequence conservation among species. Therefore, BAIUCAS is a powerful method for the identification of UCASs, and it is particularly useful for the detection of uORFs with a small number of conserved amino acid residues.