Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
. 2011 Nov 25;12(11):R118.
doi: 10.1186/gb-2011-12-11-r118.

Hundreds of putatively functional small open reading frames in Drosophila

Affiliations

Hundreds of putatively functional small open reading frames in Drosophila

Emmanuel Ladoukakis et al. Genome Biol. .

Abstract

Background: The relationship between DNA sequence and encoded information is still an unsolved puzzle. The number of protein-coding genes in higher eukaryotes identified by genome projects is lower than was expected, while a considerable amount of putatively non-coding transcription has been detected. Functional small open reading frames (smORFs) are known to exist in several organisms. However, coding sequence detection methods are biased against detecting such very short open reading frames. Thus, a substantial number of non-canonical coding regions encoding short peptides might await characterization.

Results: Using bio-informatics methods, we have searched for smORFs of less than 100 amino acids in the putatively non-coding euchromatic DNA of Drosophila melanogaster, and initially identified nearly 600,000 of them. We have studied the pattern of conservation of these smORFs as coding entities between D. melanogaster and Drosophila pseudoobscura, their presence in syntenic and in transcribed regions of the genome, and their ratio of conservative versus non-conservative nucleotide changes. For negative controls, we compared the results with those obtained using random short sequences, while a positive control was provided by smORFs validated by proteomics data.

Conclusions: The combination of these analyses led us to postulate the existence of at least 401 functional smORFs in Drosophila, with the possibility that as many as 4,561 such functional smORFs may exist.

PubMed Disclaimer

Figures

Figure 1
Figure 1
Search pipeline for Drosophila smORFs. Diagram of the smORF search pipeline followed in this study. The percentages of smORFs passing each filter are indicated. For full details, see Results and Materials and methods. CDS, coding DNA sequence; Dm, Drosophila melanogaster; Dp, Drosophila pseudoobscura; Ka/Ks, ratio of non-synonymous (Ka) to synonymous (Ks) nucleotide substitution.
Figure 2
Figure 2
Size distributions of different pools of smORFs. The size distribution of different pools of smORFs is represented according to their length in codons. Medians are indicated. (a) 43,197 smORFs with tBLASTn hits with E-value < 1 × 10-3 representing putative smORFs with some kind of sequence conservation in D. pseudoobscura. Mean size = 44 codons, standard deviation = 12. (b) 4,561 putative smORFs with conservation of sequence and start and stop codons in D. pseudoobscura, representing our upper estimate for the number of smORFs in Drosophila. Mean size = 25 codons, standard deviation = 12. (c) 1,075 smORFs with syntenic conservation, and start and stop codons in D. pseudoobscura, and with a Ka/Ks (ratio of non-synonymous (Ka) to synonymous (Ks) nucleotide substitution) score < 0.1. Mean size = 19 codons, standard deviation = 8. (d) 401 smORFs with conservation of sequence, and start and stop codons in D. pseudoobscura, with a Ka/Ks score < 0.1, and also present in transcribed regions, representing our conservative estimate. Mean size = 21 codons, standard deviation = 12. For a statistical analysis of the differences between these distributions, see Additional file 1.
Figure 3
Figure 3
Cumulative size distributions of smORFs with conserved start and stop codons. (a, b) Size distributions (represented as cumulative graphs) for the putative D. melanogaster smORFs with tBLASTn E-value < 1 × 10-3 (a) or < 0.05 (b), and conserved, in-frame start and stop codons in D. pseudoobscura (SS), and their respective controls composed of reverse stop-to-start control 'smORFs' passing the same filters. The candidate 'real' smORF distributions are very different from the controls representing random short DNA sequences. For a statistical analysis of the differences between these distributions, see Additional file 1.
Figure 4
Figure 4
Distribution of the 4,561 smORFs conserved in D. pseudoobscura. Venn diagram representing the distribution of the smORFs with start and stop codons in D. pseudoobscura and passing each of the different validation filters, and their combinations. Each circle is proportional to the size of the population it represents. Dp, D. pseudoobscura; Ka/Ks, ratio of non-synonymous (Ka) to synonymous (Ks) nucleotide substitution.
Figure 5
Figure 5
Relaxed pipeline for smORF search. Pipeline for search of smORFs with the lowered E-value < 0.05 threshold for the tBLASTn filter. Despite an initial higher percentage of smORFs passing this filter, subsequent results are similar to those obtained by the initial stricter pipeline shown in Figure 1. For details see text and Materials and methods. Dp, D. pseudoobscura; Ka/Ks, ratio of non-synonymous (Ka) to synonymous (Ks) nucleotide substitution.

Similar articles

Cited by

References

    1. Goffeau A, Barrell BG, Bussey H, Davis RW, Dujon B, Feldmann H, Galibert F, Hoheisel JD, Jacq C, Johnston M, Louis EJ, Mewes HW, Murakami Y, Philippsen P, Tettelin H, Oliver SG. Life with 6000 genes. Science. 1996;274:546. doi: 10.1126/science.274.5287.546. 563-547. - DOI - PubMed
    1. Claverie JM, Poirot O, Lopez F. The difficulty of identifying genes in anonymous vertebrate sequences. Comput Chem. 1997;21:203–214. doi: 10.1016/S0097-8485(96)00039-3. - DOI - PubMed
    1. Brent MR. Genome annotation past, present, and future: how to define an ORF at each locus. Genome Res. 2005;15:1777–1786. doi: 10.1101/gr.3866105. - DOI - PubMed
    1. Pena-Castillo L, Hughes TR. Why are there still over 1000 uncharacterized yeast genes? Genetics. 2007;176:7–14. doi: 10.1534/genetics.107.074468. - DOI - PMC - PubMed
    1. Basrai MA, Hieter P, Boeke JD. Small open reading frames: beautiful needles in the haystack. Genome Res. 1997;7:768–771. - PubMed

Publication types

LinkOut - more resources