Getting insight into the pan-genome structure with PangTree

Paulina Dziadkiewicz; Norbert Dojer

doi:10.1186/s12864-020-6610-4

Getting insight into the pan-genome structure with PangTree

BMC Genomics. 2020 Apr 16;21(Suppl 2):274. doi: 10.1186/s12864-020-6610-4.

Authors

Paulina Dziadkiewicz^{1

2}, Norbert Dojer³

Affiliations

¹ Faculty of Mathematics, Informatics and Mechanics, University of Warsaw, Banacha 2, Warsaw, 02-097, Poland.
² Faculty of Mathematics and Information Science, Warsaw University of Technology, Koszykowa 75, Warsaw, 02-097, Poland.
³ Faculty of Mathematics, Informatics and Mechanics, University of Warsaw, Banacha 2, Warsaw, 02-097, Poland. dojer@mimuw.edu.pl.

Abstract

Background: The term pan-genome was proposed to denominate collections of genomic sequences jointly analyzed or used as a reference. The constant growth of genomic data intensifies development of data structures and algorithms to investigate pan-genomes efficiently.

Results: This work focuses on providing a tool for discovering and visualizing the relationships between the sequences constituting a pan-genome. A new structure to represent such relationships - called affinity tree - is proposed. Each node of this tree has assigned a subset of genomes, as well as their homogeneity level and averaged consensus sequence. Moreover, subsets assigned to sibling nodes form a partition of the genomes assigned to their parent.

Conclusions: Functionality of affinity tree is demonstrated on simulated data and on the Ebola virus pan-genome. Furthermore, two software packages are provided: PangTreeBuild constructs affinity tree, while PangTreeVis presents its result.

Keywords: Affinity tree; Multiple genome alignment; Pan-genome.

MeSH terms

Algorithms
Computational Biology
Computer Simulation
Databases, Genetic
Ebolavirus / genetics*
Genomics / methods*
Models, Genetic
Phylogeny
Sequence Alignment
Software