The pneumococcus produces a polysaccharide capsule, encoded by the cps locus, that provides protection against phagocytosis and determines serotype. Nearly 100 serotypes have been identified with new serotypes still being discovered, especially in previously understudied regions. Here we present an analysis of the cps loci of more than 18 000 genomes from the Global Pneumococcal Sequencing (GPS) project with the aim of identifying novel cps loci with the potential to produce previously unrecognized capsule structures. Serotypes were assigned using whole genome sequence data and 66 of the approximately 100 known serotypes were included in the final dataset. Closer examination of each serotype's sequences identified nine putative novel cps loci (9X, 11X, 16X, 18X1, 18X2, 18X3, 29X, 33X and 36X) found in ~2.6 % of the genomes. The large number and global distribution of GPS genomes provided an unprecedented opportunity to identify novel cps loci and consider their phylogenetic and geographical distribution. Nine putative novel cps loci were identified and examples of each will undergo subsequent structural and immunological analysis.
Keywords: Streptococcus pneumoniae; cps locus; pneumococcus; polysaccharide capsule; serotype.