Skip to main page content
Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
, 12, 433

Experimental Annotation of Post-Translational Features and Translated Coding Regions in the Pathogen Salmonella Typhimurium

Affiliations

Experimental Annotation of Post-Translational Features and Translated Coding Regions in the Pathogen Salmonella Typhimurium

Charles Ansong et al. BMC Genomics.

Abstract

Background: Complete and accurate genome annotation is crucial for comprehensive and systematic studies of biological systems. However, determining protein-coding genes for most new genomes is almost completely performed by inference using computational predictions with significant documented error rates (> 15%). Furthermore, gene prediction programs provide no information on biologically important post-translational processing events critical for protein function.

Results: We experimentally annotated the bacterial pathogen Salmonella Typhimurium 14028, using "shotgun" proteomics to accurately uncover the translational landscape and post-translational features. The data provide protein-level experimental validation for approximately half of the predicted protein-coding genes in Salmonella and suggest revisions to several genes that appear to have incorrectly assigned translational start sites, including a potential novel alternate start codon. Additionally, we uncovered 12 non-annotated genes missed by gene prediction programs, as well as evidence suggesting a role for one of these novel ORFs in Salmonella pathogenesis. We also characterized post-translational features in the Salmonella genome, including chemical modifications and proteolytic cleavages. We find that bacteria have a much larger and more complex repertoire of chemical modifications than previously thought including several novel modifications. Our in vivo proteolysis data identified more than 130 signal peptide and N-terminal methionine cleavage events critical for protein function.

Conclusion: This work highlights several ways in which application of proteomics data can improve the quality of genome annotations to facilitate novel biological insights and provides a comprehensive proteome map of Salmonella as a resource for systems analysis.

Figures

Figure 1
Figure 1
Validation of computationally predicted genes. Multiple peptides mapping to predicted gene ORF04105 evidence for the expression of the product(s) encoded by ORF04105. Protein sequence of ORF04105 shown in black, and the identified peptides are shown in red.
Figure 2
Figure 2
Correcting start site assignment for ORF00641. Multiple peptides (green bars) map upstream of the predicted ORF00641 (yellow bar) including several peptides that spans the currently predicted gene start, providing evidence in support of extending the start site. The in-frame * symbol represents the stop codon TAA.
Figure 3
Figure 3
Correcting start site assignment for ORF01800. Multiple peptides (green bars) map to predicted ORF01800 (yellow bar) including one peptide that spans the currently predicted gene start, providing evidence in support of extending the start site. The in-frame * symbol represents the stop codon TAA.
Figure 4
Figure 4
Evidence for potential novel alternative start codon. Multiple peptides (green bars) map to predicted ORF01417 (yellow bar) including one peptide that spans the currently predicted gene start, providing evidence in support of extending the start site. The in-frame * symbol represents the stop codon TGA.
Figure 5
Figure 5
Validation of mass spectral evidence. Annotated experimental MS/MS spectra of the peptide "GIEPMALTKAEMSEYLFDKLGLSKR" with the fragmentation ladder below.
Figure 6
Figure 6
Identification of novel genes. A) Five proteomics identified peptides map to the genomic region 795142 to 795556 on the forward strand, where no gene had been previously predicted by computational approaches. B) Sequence alignment shows 100% homology to a phage protein in Salmonella Typhimurium strain D23580. C) Examination of protein expression shows the novel ORF is highly expressed under infection-mimicking conditions.
Figure 7
Figure 7
Post-translational chemical medication (PTCM) analysis using de novo-Ustag approach. An example illustrating the application of de novo-UStag approach in detection of multiple modification and high resolution MS/MS for distinction between multiple possible combinations. C-terminus amidation and tryptophan oxidation to oxolactone was determined for the sequence SLKELVESDQKWR of the Salmonella thiamin/thiamin pyrophosphate ABC transporter protein. Explanation for the determination of modifications is detailed in text.
Figure 8
Figure 8
Identification of a novel PTCM in Salmonella. Cyano modification on Cys was determined for the sequence CKIEQAPGQHGAR of the Salmonella ribosomal protein S4. Explanation for the determination of modifications is detailed in text. Inset figure demonstrates how sequencing precision in high resolution MS/MS spectra easily resolves residues with close mass like Gln (Q) and Lys (K) which cannot be resolved in low resolution spectra. The map in the lower right corner of contains all predicted fragments observed from the resolved isotopic clusters with shade of spots corresponding to parts per million (ppm) distance from expected value. This map is used to determine the correct identification between all plausible sequences and PTCMs matching observed mass shifts.
Figure 9
Figure 9
In vivo proteolytic cleavage analysis. Distribution of the curated non-tryptic peptides with residue start positions between 2 and 60 reveals two peaks at residue start positions 1-5 and at residue start positions 21-25 indicative of N-terminal methionine cleavage and cleavage of signal peptides respectively.
Figure 10
Figure 10
Experimental annotation of N-terminal methionine cleavage. The frequency of amino acid occurrence at position P1' reveals a clear preference for small amino acids at P1' in agreement with current N-terminal methionine cleavage rules.
Figure 11
Figure 11
Experimental annotation of signal peptide cleavage. The upper panel shows the sequence logo for the amino acid sequence motif of all signal peptides identified by high resolution MS/MS (STable 8). The lower panel shows sequence logo for gram negative bacteria employed by SignalP (Image reproduced from Nielsen et al. 1997).

Similar articles

See all similar articles

Cited by 11 articles

See all "Cited by" articles

References

    1. Meyer F, Goesmann A, McHardy AC, Bartels D, Bekel T, Clausen J, Kalinowski J, Linke B, Rupp O, Giegerich R. et al. GenDB--an open source genome annotation system for prokaryote genomes. Nucleic Acids Res. 2003;31(8):2187–2195. doi: 10.1093/nar/gkg312. - DOI - PMC - PubMed
    1. Peterson JD, Umayam LA, Dickinson T, Hickey EK, White O. The Comprehensive Microbial Resource. Nucleic Acids Res. 2001;29(1):123–125. doi: 10.1093/nar/29.1.123. - DOI - PMC - PubMed
    1. Van Domselaar GH, Stothard P, Shrivastava S, Cruz JA, Guo A, Dong X, Lu P, Szafron D, Greiner R, Wishart DS. BASys: a web server for automated bacterial genome annotation. Nucleic Acids Res. 2005. pp. W455–459. - PMC - PubMed
    1. Ansong C, Purvine SO, Adkins JN, Lipton MS, Smith RD. Proteogenomics: needs and roles to be filled by proteomics in genome annotation. Brief Funct Genomic Proteomic. 2008;7(1):50–62. doi: 10.1093/bfgp/eln010. - DOI - PubMed
    1. de Groot A, Dulermo R, Ortet P, Blanchard L, Guerin P, Fernandez B, Vacherie B, Dossat C, Jolivet E, Siguier P. et al. Alliance of proteomics and genomics to unravel the specificities of Sahara bacterium Deinococcus deserti. PLoS Genet. 2009;5(3):e1000434. doi: 10.1371/journal.pgen.1000434. - DOI - PMC - PubMed

Publication types

LinkOut - more resources

Feedback