Next generation sequencing technologies have engendered a genome sequence data deluge in public databases. Genome analyses have transitioned from single or few genomes to hundreds to thousands of genomes. Pan-genome analyses provide a framework for estimating the genomic diversity of the dataset at hand and predicting the number of additional whole genomes sequences that would be necessary to fully characterize that diversity. We review recent implementations of the pan-genome approach, its impact and limits, and we propose possible extensions, including analyses at the whole genome multiple sequence alignment level.
Copyright © 2014 Elsevier Ltd. All rights reserved.