Skip to main page content
Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
Filters applied. Clear all
. 2016 Sep 29;11(9):e0163111.
doi: 10.1371/journal.pone.0163111. eCollection 2016.

MetaPhinder-Identifying Bacteriophage Sequences in Metagenomic Data Sets

Free PMC article

MetaPhinder-Identifying Bacteriophage Sequences in Metagenomic Data Sets

Vanessa Isabell Jurtz et al. PLoS One. .
Free PMC article


Bacteriophages are the most abundant biological entity on the planet, but at the same time do not account for much of the genetic material isolated from most environments due to their small genome sizes. They also show great genetic diversity and mosaic genomes making it challenging to analyze and understand them. Here we present MetaPhinder, a method to identify assembled genomic fragments (i.e.contigs) of phage origin in metagenomic data sets. The method is based on a comparison to a database of whole genome bacteriophage sequences, integrating hits to multiple genomes to accomodate for the mosaic genome structure of many bacteriophages. The method is demonstrated to out-perform both BLAST methods based on single hits and methods based on k-mer comparisons. MetaPhinder is available as a web service at the Center for Genomic Epidemiology, while the source code can be downloaded from or

Conflict of interest statement

The authors have declared that no competing interests exist.


Fig 1
Fig 1. Size distribution of the phage and bacterial genomes and the cuts made to imitate metagenomic contigs.
Fig 2
Fig 2. Performance according to sequence length.
The performances measured in AUC for %ANI classification and top-hit e-value classification are compared. Data are binned according to sequence length and performance is shown separately for each bin. Note that the amount of sequences in each bin differs but the amount of positive and negative examples is always comparable.
Fig 3
Fig 3. Comparison of different methods used to search the whole genome phage sequences database.
For BLAST and tBLASTx %ANI is calculated to determine similarity to the database, whereas for KmerFinder the query coverage (qcov) is applied. Performance is evaluated based on AUC. Data are binned according to sequence length and performance is shown separately for each bin.
Fig 4
Fig 4. MetaPhinder performance curves.
(A) ROC curve intersected by the dashed line used to select the classificaction threshold. (B) True positive rate and false positive rate compared for different classification thresholds (in %ANI to the whole phage database). The vertical dashed line indicates the selected classification threshold.

Similar articles

See all similar articles

Cited by 12 articles

See all "Cited by" articles


    1. Suttle CA. Viruses in the sea. Nature. 2005. September;437(7057):356–61. Available from: 10.1038/nature04160 - DOI - PubMed
    1. Sanger F, Air G, Barrell B, Brown N, Coulson A, Fiddes J, et al. Nucleotide sequence of bacteriophage pX174 DNA. Nature. 1977;p. 687–695. 10.1038/265687a0 - DOI - PubMed
    1. Harper DR, Anderson J, Enright MC. Phage therapy: delivering on the promise. Therapeutic delivery. 2011. July;2(7):935–47. Available from: 10.4155/tde.11.64 - DOI - PubMed
    1. Keen EC. A century of phage research: bacteriophages and the shaping of modern biology. BioEssays: news and reviews in molecular, cellular and developmental biology. 2015. January;37(1):6–9. Available from: 10.1002/bies.201400152 - DOI - PMC - PubMed
    1. Golkar Z, Bagasra O, Pace DG. Bacteriophage therapy: a potential solution for the antibiotic resistance crisis. The Journal of Infection in Developing Countries. 2014. February;8(02):129–136. Available from: 10.3855/jidc.3573 - DOI - PubMed

Grant support

This work was supported by the Center for Genomic Epidemiology ( at the Technical University of Denmark and funded by grant 09-067103/DSF from the Danish Council for Strategic Research. The funders had no role in study design, data collection and analysis, decision to publish, or preparation of the manuscript.