PGP: parallel prokaryotic proteogenomics pipeline for MPI clusters, high-throughput batch clusters and multicore workstations

Bioinformatics. 2014 May 15;30(10):1469-70. doi: 10.1093/bioinformatics/btu051. Epub 2014 Jan 27.


Summary: We present the first public release of our proteogenomic annotation pipeline. We have previously used our original unreleased implementation to improve the annotation of 46 diverse prokaryotic genomes by discovering novel genes, post-translational modifications and correcting the erroneous annotations by analyzing proteomic mass-spectrometry data. This public version has been redesigned to run in a wide range of parallel Linux computing environments and provided with the automated configuration, build and testing facilities for easy deployment and portability.

Availability and implementation: Source code is freely available from under GPL license. It is implemented in Python and C++. It bundles the Makeflow engine to execute the workflows.


Publication types

  • Research Support, Non-U.S. Gov't
  • Research Support, U.S. Gov't, Non-P.H.S.

MeSH terms

  • Cluster Analysis*
  • Genomics / methods*
  • High-Throughput Nucleotide Sequencing / methods*
  • Mass Spectrometry
  • Prokaryotic Cells / chemistry*
  • Proteomics / methods*
  • Software