Automatic prediction of polysaccharide utilization loci in Bacteroidetes species

Bioinformatics. 2015 Mar 1;31(5):647-55. doi: 10.1093/bioinformatics/btu716. Epub 2014 Oct 28.


Motivation: A bacterial polysaccharide utilization locus (PUL) is a set of physically linked genes that orchestrate the breakdown of a specific glycan. PULs are prevalent in the Bacteroidetes phylum and are key to the digestion of complex carbohydrates, notably by the human gut microbiota. A given Bacteroidetes genome can encode dozens of different PULs whose boundaries and precise gene content are difficult to predict.

Results: Here, we present a fully automated approach for PUL prediction using genomic context and domain annotation alone. By combining the detection of a pair of marker genes with operon prediction using intergenic distances, and queries to the carbohydrate-active enzymes database (, our predictor achieved above 86% accuracy in two Bacteroides species with extensive experimental PUL characterization.

Availability and implementation: PUL predictions in 67 Bacteroidetes genomes from the human gut microbiota and two additional species, from the canine oral sphere and from the environment, are presented in our database accessible at

Publication types

  • Research Support, Non-U.S. Gov't

MeSH terms

  • Animals
  • Automation / methods*
  • Bacteroidetes / genetics*
  • Bacteroidetes / growth & development
  • Bacteroidetes / metabolism*
  • Dogs
  • Gastrointestinal Tract / microbiology*
  • Genetic Loci*
  • Genome, Bacterial / genetics*
  • Humans
  • Microbiota / physiology*
  • Polysaccharides / metabolism*
  • Symbiosis


  • Polysaccharides