Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
. 2021 Sep 29;37(18):3019-3020.
doi: 10.1093/bioinformatics/btab090.

orfipy: a fast and flexible tool for extracting ORFs

Affiliations

orfipy: a fast and flexible tool for extracting ORFs

Urminder Singh et al. Bioinformatics. .

Abstract

Summary: Searching for open reading frames is a routine task and a critical step prior to annotating protein coding regions in newly sequenced genomes or de novo transcriptome assemblies. With the tremendous increase in genomic and transcriptomic data, faster tools are needed to handle large input datasets. These tools should be versatile enough to fine-tune search criteria and allow efficient downstream analysis. Here we present a new python based tool, orfipy, which allows the user to flexibly search for open reading frames in genomic and transcriptomic sequences. The search is rapid and is fully customizable, with a choice of FASTA and BED output formats.

Availability and implementation: orfipy is implemented in python and is compatible with python v3.6 and higher. Source code: https://github.com/urmi-21/orfipy. Installation: from the source, or via PyPi (https://pypi.org/project/orfipy) or bioconda (https://anaconda.org/bioconda/orfipy).

Supplementary information: Supplementary data are available at Bioinformatics online.

PubMed Disclaimer

Figures

Fig. 1.
Fig. 1.
Comparison of orfipy features and performance with getorf and OrfM. We compared attributes of orfipy with two commonly used tools for ORF identification. (A) Comparison of orfipy features with getorf and OrfM. orfipy provides a number of options to fine-tune ORF search, this includes labeling the ORF type, reporting only the longest ORF and reporting ORFs by translation frame. To allow reproducible analysis, orfipy logs the commands. (B) Example of FASTA headers written to output files by each tool. orfipy output provides information about each ORF that can be readily used in downstream analyses. (C and D) Runtimes, using plain FASTA input, on HPC (128 GB RAM; 28 cores) (C) and PC (16 GB RAM; 8 cores) (D) environments (Supplementary Data). Each analysis was run three times, via pyrpipe (Singh et al., 2020), and the mean runtime is reported. orfipy runtimes are comparable to OrfM for the large microbial and human transcriptome data. orfipy is fastest when ORFs are saved to a BED file; OrfM is fastest when ORFs are saved to peptide FASTA. Data sizes: A.thaliana genome 120 MB; microbial sequences 1.5 GB; human transcriptome 370 MB. fasta, output ORFs to nucleotide and peptide FASTA; bed, output ORFs to BED file; peptide, output ORFs to peptide-only FASTA

Similar articles

Cited by

References

    1. Du L. et al. (2020) Pyfastx: a robust python package for fast random access to sequences from plain and gzipped fasta/q files. Brief. Bioinf. doi: 10.1093/bib/bbaa368. - PubMed
    1. Heames B. et al. (2020) A continuum of evolving de novo genes drives protein-coding novelty in drosophila. J. Mol. Evol., 88, 382–317. - PMC - PubMed
    1. Mahmood K. et al. (2020) De novo transcriptome assembly, functional annotation, and expression profiling of rye (Secale cereale l.) hybrids inoculated with ergot (Claviceps purpurea). Sci. Rep., 10, 1–16. - PMC - PubMed
    1. Martinez T.F. et al. (2020) Accurate annotation of human protein-coding small open reading frames. Nat. Chem. Biol., 16, 458–468. - PMC - PubMed
    1. Rice,P. et al. . (2000) EMBOSS: The European Molecular Biology Open Software Suite. Trends Genet., 16, 276–277. - PubMed

Publication types