Finding approximate gene clusters with Gecko 3

Nucleic Acids Res. 2016 Nov 16;44(20):9600-9610. doi: 10.1093/nar/gkw843. Epub 2016 Sep 26.

Abstract

Gene-order-based comparison of multiple genomes provides signals for functional analysis of genes and the evolutionary process of genome organization. Gene clusters are regions of co-localized genes on genomes of different species. The rapid increase in sequenced genomes necessitates bioinformatics tools for finding gene clusters in hundreds of genomes. Existing tools are often restricted to few (in many cases, only two) genomes, and often make restrictive assumptions such as short perfect conservation, conserved gene order or monophyletic gene clusters. We present Gecko 3, an open-source software for finding gene clusters in hundreds of bacterial genomes, that comes with an easy-to-use graphical user interface. The underlying gene cluster model is intuitive, can cope with low degrees of conservation as well as misannotations and is complemented by a sound statistical evaluation. To evaluate the biological benefit of Gecko 3 and to exemplify our method, we search for gene clusters in a dataset of 678 bacterial genomes using Synechocystis sp. PCC 6803 as a reference. We confirm detected gene clusters reviewing the literature and comparing them to a database of operons; we detect two novel clusters, which were confirmed by publicly available experimental RNA-Seq data. The computational analysis is carried out on a laptop computer in <40 min.

MeSH terms

  • Algorithms
  • Computational Biology / methods*
  • Datasets as Topic
  • Genes, Bacterial
  • Genome, Bacterial
  • Genomics / methods*
  • Models, Statistical
  • Multigene Family*
  • Software*
  • Web Browser
  • Workflow