Skip to main page content
Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
, 6, 24373

BPGA- An Ultra-Fast Pan-Genome Analysis Pipeline


BPGA- An Ultra-Fast Pan-Genome Analysis Pipeline

Narendrakumar M Chaudhari et al. Sci Rep.


Recent advances in ultra-high-throughput sequencing technology and metagenomics have led to a paradigm shift in microbial genomics from few genome comparisons to large-scale pan-genome studies at different scales of phylogenetic resolution. Pan-genome studies provide a framework for estimating the genomic diversity of the dataset, determining core (conserved), accessory (dispensable) and unique (strain-specific) gene pool of a species, tracing horizontal gene-flux across strains and providing insight into species evolution. The existing pan genome software tools suffer from various limitations like limited datasets, difficult installation/requirements, inadequate functional features etc. Here we present an ultra-fast computational pipeline BPGA (Bacterial Pan Genome Analysis tool) with seven functional modules. In addition to the routine pan genome analyses, BPGA introduces a number of novel features for downstream analyses like core/pan/MLST (Multi Locus Sequence Typing) phylogeny, exclusive presence/absence of genes in specific strains, subset analysis, atypical G + C content analysis and KEGG &COG mapping of core, accessory and unique genes. Other notable features include minimum running prerequisites, freedom to select the gene clustering method, ultra-fast execution, user friendly command line interface and high-quality graphics outputs. The performance of BPGA has been evaluated using a dataset of complete genome sequences of 28 Streptococcus pyogenes strains.


Figure 1
Figure 1. BPGA workflow.
Figure 2
Figure 2. Overview of the results generated by BPGA using 28 strains of S. pyogenes.
(a) The gene family frequency spectrum. (b) New gene family distribution after sequential addition of each genome to the analysis. (c) The pan genome profile trends obtained using clustering tools- USEARCH, CD-HIT and OrthoMCL. (d) COG distribution of core, accessory and unique genes. (e) KEGG distribution of core, accessory and unique genes.
Figure 3
Figure 3. Phylogenetic analysis by BPGA using 28 strains of S. pyogenes based on.
(a) concatenated core genes (b) concatenated housekeeping genes (MLST) (c) binary pan-matrix. (Blue: Group M1 strains and Red: Group M12 strains).

Similar articles

See all similar articles

Cited by 93 PubMed Central articles

See all "Cited by" articles


    1. Tettelin H. et al. Genome analysis of multiple pathogenic isolates of Streptococcus agalactiae: implications for the microbial “pan-genome”. Proc Natl Acad Sci USA 102, 13950–5 (2005). - PMC - PubMed
    1. Rasko D. A. et al. The pangenome structure of Escherichia coli: comparative genomic analysis of E. coli commensal and pathogenic isolates. J Bacteriol 190, 6881–93 (2008). - PMC - PubMed
    1. Smokvina T. et al. Lactobacillus paracasei comparative genomics: towards species pan-genome definition and exploitation of diversity. Plos One 8, e68731 (2013). - PMC - PubMed
    1. Hogg J. S. et al. Characterization and modeling of the Haemophilus influenzae core and supragenomes based on the complete genomic sequences of Rd and 12 clinical nontypeable strains. Genome Biol 8, R103 (2007). - PMC - PubMed
    1. Snipen L., Almoy T. & Ussery D. W. Microbial comparative pan-genomics using binomial mixture models. BMC Genomics 10, 385 (2009). - PMC - PubMed

Publication types

MeSH terms