Interpolated variable order motifs for identification of horizontally acquired DNA: revisiting the Salmonella pathogenicity islands
- PMID: 16837528
- DOI: 10.1093/bioinformatics/btl369
Interpolated variable order motifs for identification of horizontally acquired DNA: revisiting the Salmonella pathogenicity islands
Abstract
Motivation: There is a growing literature on the detection of Horizontal Gene Transfer (HGT) events by means of parametric, non-comparative methods. Such approaches rely only on sequence information and utilize different low and high order indices to capture compositional deviation from the genome backbone; the superiority of the latter over the former has been shown elsewhere. However even high order k-mers may be poor estimators of HGT, when insufficient information is available, e.g. in short sliding windows. Most of the current HGT prediction methods require pre-existing annotation, which may restrict their application on newly sequenced genomes.
Results: We introduce a novel computational method, Interpolated Variable Order Motifs (IVOMs), which exploits compositional biases using variable order motif distributions and captures more reliably the local composition of a sequence compared with fixed-order methods. For optimal localization of the boundaries of each predicted region, a second order, two-state hidden Markov model (HMM) is implemented in a change-point detection framework. We applied the IVOM approach to the genome of Salmonella enterica serovar Typhi CT18, a well-studied prokaryote in terms of HGT events, and we show that the IVOMs outperform state-of-the-art low and high order motif methods predicting not only the already characterized Salmonella Pathogenicity Islands (SPI-1 to SPI-10) but also three novel SPIs (SPI-15, SPI-16, SPI-17) and other HGT events.
Availability: The software is available under a GPL license as a standalone application at http://www.sanger.ac.uk/Software/analysis/alien_hunter
Contact: gsv@sanger.ac.uk
Supplementary information: Supplementary data are available at Bioinformatics online.
Similar articles
-
Genome-based identification and molecular analyses of pathogenicity islands and genomic islands in Salmonella enterica.Methods Mol Biol. 2007;394:77-88. doi: 10.1007/978-1-59745-512-1_5. Methods Mol Biol. 2007. PMID: 18363232
-
Identification of compositionally distinct regions in genomes using the centroid method.Bioinformatics. 2007 Oct 15;23(20):2672-7. doi: 10.1093/bioinformatics/btm405. Epub 2007 Aug 27. Bioinformatics. 2007. PMID: 17724060
-
A database of phylogenetically atypical genes in archaeal and bacterial genomes, identified using the DarkHorse algorithm.BMC Bioinformatics. 2008 Oct 7;9:419. doi: 10.1186/1471-2105-9-419. BMC Bioinformatics. 2008. PMID: 18840280 Free PMC article.
-
Pathogenicity islands: a molecular toolbox for bacterial virulence.Cell Microbiol. 2006 Nov;8(11):1707-19. doi: 10.1111/j.1462-5822.2006.00794.x. Epub 2006 Aug 24. Cell Microbiol. 2006. PMID: 16939533 Review.
-
Evolution of Salmonella-Host Cell Interactions through a Dynamic Bacterial Genome.Front Cell Infect Microbiol. 2017 Sep 29;7:428. doi: 10.3389/fcimb.2017.00428. eCollection 2017. Front Cell Infect Microbiol. 2017. PMID: 29034217 Free PMC article. Review.
Cited by
-
The Nocardia cyriacigeorgica GUH-2 genome shows ongoing adaptation of an environmental Actinobacteria to a pathogen's lifestyle.BMC Genomics. 2013 Apr 27;14:286. doi: 10.1186/1471-2164-14-286. BMC Genomics. 2013. PMID: 23622346 Free PMC article.
-
Isolation, molecular identification, and genomic analysis of Mangrovibacter phragmitis strain ASIOC01 from activated sludge harboring the bioremediation prowess of glycerol and organic pollutants in high-salinity.Front Microbiol. 2024 Jun 25;15:1415723. doi: 10.3389/fmicb.2024.1415723. eCollection 2024. Front Microbiol. 2024. PMID: 38983623 Free PMC article.
-
Lateral Gene Transfer in a Heavy Metal-Contaminated-Groundwater Microbial Community.mBio. 2016 Apr 5;7(2):e02234-15. doi: 10.1128/mBio.02234-15. mBio. 2016. PMID: 27048805 Free PMC article.
-
Characterization of Toxin Complex Gene Clusters and Insect Toxicity of Bacteria Representing Four Subgroups of Pseudomonas fluorescens.PLoS One. 2016 Aug 31;11(8):e0161120. doi: 10.1371/journal.pone.0161120. eCollection 2016. PLoS One. 2016. PMID: 27580176 Free PMC article.
-
Genomes of three tomato pathogens within the Ralstonia solanacearum species complex reveal significant evolutionary divergence.BMC Genomics. 2010 Jun 15;11:379. doi: 10.1186/1471-2164-11-379. BMC Genomics. 2010. PMID: 20550686 Free PMC article.
Publication types
MeSH terms
Substances
Grants and funding
LinkOut - more resources
Full Text Sources
