Metagenomic sequencing provides a culture-independent avenue to investigate the complex microbial communities by constructing metagenome-assembled genomes (MAGs). A MAG represents a microbial genome by a group of sequences from genome assembly with similar characteristics. It enables us to identify novel species and understand their potential functions in a dynamic ecosystem. Many computational tools have been developed to construct and annotate MAGs from metagenomic sequencing, however, there is a prominent gap to comprehensively introduce their background and practical performance. In this paper, we have thoroughly investigated the computational tools designed for both upstream and downstream analyses, including metagenome assembly, metagenome binning, gene prediction, functional annotation, taxonomic classification, and profiling. We have categorized the commonly used tools into unique groups based on their functional background and introduced the underlying core algorithms and associated information to demonstrate a comparative outlook. Furthermore, we have emphasized the computational requisition and offered guidance to the users to select the most efficient tools. Finally, we have indicated current limitations, potential solutions, and future perspectives for further improving the tools of MAG construction and annotation. We believe that our work provides a consolidated resource for the current stage of MAG studies and shed light on the future development of more effective MAG analysis tools on metagenomic sequencing.
Keywords: CNN, convolutional neural network; DBG, De Bruijn graph; GTDB, Genome Taxonomy Database; Gene functional annotation; Gene prediction; Genome assembly; HMM, Hidden Markov Model; KEGG, Kyoto Encyclopedia of Genes and Genomes; LCA, lowest common ancestor; LPA, label propagation algorithm; MAGs, metagenome-assembled genomes; Metagenome binning; Metagenome-assembled genomes; Metagenomic sequencing; Microbial abundance profiling; OLC, overlap-layout consensus; ONT, Oxford Nanopore Technologies; ORFs, open reading frames; PacBio, Pacific Biosciences; QC, quality control; SLR, synthetic long reads; TNFs, tetranucleotide frequencies; Taxonomic classification.
© 2021 The Author(s).