A new graph-theoretic approach to determine the similarity of genome sequences based on nucleotide triplets

Genomics. 2020 Nov;112(6):4701-4714. doi: 10.1016/j.ygeno.2020.08.023. Epub 2020 Aug 19.

Abstract

Methods of finding sequence similarity play a significant role in computational biology. Owing to the rapid increase of genome sequences in public databases, the evolutionary relationship of species becomes more challenging. But traditional alignment-based methods are found inappropriate due to their time-consuming nature. Therefore, it is necessary to find a faster method, which applies to species phylogeny. In this paper, a new graph-theory based alignment-free sequence comparison method is proposed. A complete-bipartite graph is used to represent each genome sequence based on its nucleotide triplets. Subsequently, with the help of the weights of edges of the graph, a vector descriptor is formed. Finally, the phylogenetic tree is drawn using the UPGMA algorithm. In the present case, the datasets for comparison are related to mammals, viruses, and bacteria. In most of the cases, the phylogeny in the present case is found to be more satisfactory as compared to earlier methods.

Keywords: Alignment-based method; Alignment-free method; Bipartite graph; Evolutionary relationship; Phylogenetic tree; Sequence comparison.

MeSH terms

  • Algorithms
  • Animals
  • Bacteria / genetics
  • Computational Biology*
  • Mammals / genetics
  • Nucleotides / genetics
  • Phylogeny
  • Sequence Analysis, DNA / methods*
  • Viruses / genetics

Substances

  • Nucleotides