PanOCT: Automated Clustering of Orthologs Using Conserved Gene Neighborhood for Pan-Genomic Analysis of Bacterial Strains and Closely Related Species

Nucleic Acids Res. 2012 Dec;40(22):e172. doi: 10.1093/nar/gks757. Epub 2012 Aug 16.

Abstract

Pan-genome ortholog clustering tool (PanOCT) is a tool for pan-genomic analysis of closely related prokaryotic species or strains. PanOCT uses conserved gene neighborhood information to separate recently diverged paralogs into orthologous clusters where homology-only clustering methods cannot. The results from PanOCT and three commonly used graph-based ortholog-finding programs were compared using a set of four publicly available strains of the same bacterial species. All four methods agreed on ∼70% of the clusters and ∼86% of the proteins. The clusters that did not agree were inspected for evidence of correctness resulting in 85 high-confidence manually curated clusters that were used to compare all four methods.

Publication types

  • Evaluation Study
  • Research Support, N.I.H., Extramural
  • Research Support, U.S. Gov't, Non-P.H.S.

MeSH terms

  • Bacteria / classification
  • Bacterial Proteins / classification
  • Bacterial Proteins / genetics*
  • Cluster Analysis
  • Genes, Bacterial*
  • Genome, Bacterial*
  • Genomics / methods
  • Software*

Substances

  • Bacterial Proteins