A Novel Method to Predict Genomic Islands Based on Mean Shift Clustering Algorithm

PLoS One. 2016 Jan 5;11(1):e0146352. doi: 10.1371/journal.pone.0146352. eCollection 2016.

Abstract

Genomic Islands (GIs) are regions of bacterial genomes that are acquired from other organisms by the phenomenon of horizontal transfer. These regions are often responsible for many important acquired adaptations of the bacteria, with great impact on their evolution and behavior. Nevertheless, these adaptations are usually associated with pathogenicity, antibiotic resistance, degradation and metabolism. Identification of such regions is of medical and industrial interest. For this reason, different approaches for genomic islands prediction have been proposed. However, none of them are capable of predicting precisely the complete repertory of GIs in a genome. The difficulties arise due to the changes in performance of different algorithms in the face of the variety of nucleotide distribution in different species. In this paper, we present a novel method to predict GIs that is built upon mean shift clustering algorithm. It does not require any information regarding the number of clusters, and the bandwidth parameter is automatically calculated based on a heuristic approach. The method was implemented in a new user-friendly tool named MSGIP--Mean Shift Genomic Island Predictor. Genomes of bacteria with GIs discussed in other papers were used to evaluate the proposed method. The application of this tool revealed the same GIs predicted by other methods and also different novel unpredicted islands. A detailed investigation of the different features related to typical GI elements inserted in these new regions confirmed its effectiveness. Stand-alone and user-friendly versions for this new methodology are available at http://msgip.integrativebioinformatics.me.

Publication types

  • Research Support, Non-U.S. Gov't

MeSH terms

  • Algorithms
  • Cluster Analysis
  • Genome, Bacterial*
  • Genomic Islands*
  • Genomics / methods*

Grants and funding

VMC developed part of this research as a visiting researcher at Departamento de Biologia Molecular, Centro de Ciencias Exatas e da Natureza, Universidade Federal da Paraiba, through the program “Becas Iberoamerica - Jovenes Profesores e Investigadores”, funded by Santander Universidades. The author VMC receives partial salaries from Beagle Bioinformatics. This funder provided support in the form of salaries for author VMC, but did not have any additional role in the study design, data collection and analysis, decision to publish, or preparation of the manuscript. The specific roles of this author are articulated in the “author contributions” section.