Skip to main page content
Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
, 8, e8356

Conserved Novel ORFs in the Mitochondrial Genome of the Ctenophore Beroe forskalii


Conserved Novel ORFs in the Mitochondrial Genome of the Ctenophore Beroe forskalii

Darrin T Schultz et al. PeerJ.


To date, five ctenophore species' mitochondrial genomes have been sequenced, and each contains open reading frames (ORFs) that if translated have no identifiable orthologs. ORFs with no identifiable orthologs are called unidentified reading frames (URFs). If truly protein-coding, ctenophore mitochondrial URFs represent a little understood path in early-diverging metazoan mitochondrial evolution and metabolism. We sequenced and annotated the mitochondrial genomes of three individuals of the beroid ctenophore Beroe forskalii and found that in addition to sharing the same canonical mitochondrial genes as other ctenophores, the B. forskalii mitochondrial genome contains two URFs. These URFs are conserved among the three individuals but not found in other sequenced species. We developed computational tools called pauvre and cuttlery to determine the likelihood that URFs are protein coding. There is evidence that the two URFs are under negative selection, and a novel Bayesian hypothesis test of trinucleotide frequency shows that the URFs are more similar to known coding genes than noncoding intergenic sequence. Protein structure and function prediction of all ctenophore URFs suggests that they all code for transmembrane transport proteins. These findings, along with the presence of URFs in other sequenced ctenophore mitochondrial genomes, suggest that ctenophores may have uncharacterized transmembrane proteins present in their mitochondria.

Keywords: Bayesian; Bioinformatics; Ctenophore; Evolution; Mitochondria; Mitogenome; ORF; Selection; Sequencing; URF.

Conflict of interest statement

The authors declare that they have no competing Interests.


Figure 1
Figure 1. The ctenophore Beroe forskalii.
Figure 2
Figure 2. The B. forskalii mitogenome.
Each black concentric circle of the inner layer is one Oxford Nanopore read, organized from the longest reads on the outside of the track to the shortest on the inside. The annotation shows the direction and length of the predicted coding sequences (green) and the ribosomal RNAs (purple). Overlapping coding sequences are shown with an overlapping chevron on the 5′ end of the downstream gene. The outermost layer is a histogram of RNA-seq log-transformed read coverage at that position.
Figure 3
Figure 3. Ctenophore mitochondrial synteny map.
A synteny map of the ctenophore mitochondrial genomes. The opacity of the “brush stroke”-like bars connecting the same gene between two species increases with positional amino acid similarity using the BLOSUM62 matrix. Exact matches for ribosomal RNAs are opaque lines, while mismatches and gaps are not displayed. This plot was generated using the program pauvre synteny.
Figure 4
Figure 4. Ctenophore mitochondrial phylogeny.
This phylogeny contains the loci COX1, COX2, COX3, CYTB, ND1, ND2, ND3, ND4, ND4L, ND5 and ND6 independently aligned with MAFFT (Katoh et al., 2002) then concatenated together. No sites were removed from the amino acid matrix. Two phylogenies were created (1) using RAxML with rapid bootstrapping and (2) using Phylobayes with the CAT + GTR + Γ model, three chains, and convergence until the max difference between chains was less than 0.1. Both trees reconstructed the same topology. The branch lengths and scale shown are from the RAxML tree. The RAxML bootstrap values/Phylobayes posterior probabilities of each node are shown within the ctenophore clade.
Figure 5
Figure 5. Bayesian likelihood of ORFs being coding or noncoding.
These plots show the locus length versus the log-likelihood ratio distributions of the LOOCV trials for each noncoding, coding sequence, or test sequence (B. forskalii URF1 and URF2). (A) Beroe forskalii, (B) Chlamydomonas, (C) Daphnia, (D) Drosophila, (E) Homo, (F) Strongylocentrotus. Dotted lines are linear fits to the log-likelihood values for each simulation. Log-likelihood ratios less than zero mean that the sequence’s trinucleotide frequency was more similar to the trinucleotide frequency of noncoding sequence than that of known protein-coding sequence. Similarly, values greater than zero indicate a better match to known protein-coding sequences. For all species we tested, with the exception of ND6 in human and RTL in Chlamydomonas, we found that the novel Bayesian likelihood test for protein coding likelihood presented in this paper unambiguously can differentiate between coding and noncoding sequences for loci longer than 500 bp. The alignments used in these analyses are the same as those used in Table S5; Fig. S5.

Similar articles

See all similar articles


    1. Akasaki T, Nikaido M, Tsuchiya K, Segawa S, Hasegawa M, Okada N. Extensive mitochondrial gene arrangements in coleoid Cephalopoda and their phylogenetic implications. Molecular Phylogenetics and Evolution. 2006;38(3):648–658. - PubMed
    1. Altschul SF, Madden TL, Schäffer AA, Zhang J, Zhang Z, Miller W, Lipman DJ. Gapped BLAST and PSI-BLAST: a new generation of protein database search programs. Nucleic Acids Research. 1997;25(17):3389–3402. - PMC - PubMed
    1. Arafat H, Alamaru A, Gissi C, Huchon D. Extensive mitochondrial gene rearrangements in ctenophora: insights from benthic platyctenida. BMC Evolutionary Biology. 2018;18(1):65. - PMC - PubMed
    1. Barrell BG, Bankier AT, Drouin J. A different genetic code in human mitochondria. Nature. 1979;282(5735):189–194. - PubMed
    1. Bazin E, Glémin S, Galtier N. Population size does not influence mitochondrial genetic diversity in animals. Science. 2006;312(5773):570–572. - PubMed

Grant support

This work was supported by the David and Lucile Packard Foundation, the Monterey Bay Aquarium Research Institute, the University of California Biomolecular Engineering and Bioinformatics Department, NSF DEB-1542679; United States National Science Foundation GRFP DGE 1339067 to Darrin T. Schultz; and National Human Genome Research Institute/National Institute of Health NRSA Training Grant 5T32HG008345 and National Institute of Health NHLBI TOPMed U01 1U01HL137183 to Jordan M. Eizenga. The funders had no role in study design, data collection and analysis, decision to publish, or preparation of the manuscript.

LinkOut - more resources