MetaPhage: an Automated Pipeline for Analyzing, Annotating, and Classifying Bacteriophages in Metagenomics Sequencing Data

mSystems. 2022 Oct 26;7(5):e0074122. doi: 10.1128/msystems.00741-22. Epub 2022 Sep 7.

Abstract

Phages are the most abundant biological entities on the planet, and they play an important role in controlling density, diversity, and network interactions among bacterial communities through predation and gene transfer. To date, a variety of bacteriophage identification tools have been developed that differ in the phage mining strategies used, input files requested, and results produced. However, new users attempting bacteriophage analysis can struggle to select the best methods and interpret the variety of results produced. Here, we present MetaPhage, a comprehensive reads-to-report pipeline that streamlines the use of multiple phage miners and generates an exhaustive report. The report both summarizes and visualizes the key findings and enables further exploration of key results via interactive filterable tables. The pipeline is implemented in Nextflow, a widely adopted workflow manager that enables an optimized parallelization of tasks in different locations, from local server to the cloud; this ensures reproducible results from containerized packages. MetaPhage is designed to enable scalability and reproducibility; also, it can be easily expanded to include new miners and methods as they are developed in this continuously growing field. MetaPhage is freely available under a GPL-3.0 license at https://github.com/MattiaPandolfoVR/MetaPhage. IMPORTANCE Bacteriophages (viruses that infect bacteria) are the most abundant biological entities on earth and are increasingly studied as members of the resident microbiota community in many environments, from oceans to soils and the human gut. Their identification is of great importance to better understand complex bacterial dynamics and microbial ecosystem function. A variety of metagenome bacteriophage identification tools have been developed that differ in the phage mining strategies used, input files requested, and results produced. To facilitate the management and the execution of such a complex workflow, we developed MetaPhage (MP), a comprehensive reads-to-report pipeline that streamlines the use of multiple phage miners and generates an exhaustive report. The pipeline is implemented in Nextflow, a widely adopted workflow manager that enables an optimized parallelization of tasks. MetaPhage is designed to enable scalability and reproducibility and offers an installation-free, dependency-free, and conflict-free workflow execution.

Keywords: NGS; bacteriophages; bioinformatics; metagenomics; phage mining.

Publication types

  • Research Support, Non-U.S. Gov't

MeSH terms

  • Bacteriophages* / genetics
  • Humans
  • Metagenomics / methods
  • Microbiota* / genetics
  • Reproducibility of Results
  • Viruses*