Large scale proteomics have made it possible to broadly screen samples for the presence of many types of post-translational modifications, such as phosphorylation, acetylation, and ubiquitination. This type of data has allowed the localization of these modifications to either a specific site on a proteolytically generated peptide or to within a small domain on the peptide. The resulting modification acceptor sites can then be mapped onto the appropriate protein sequences and the information archived. This paper describes the usage of a very large archive of experimental observations of human post-translational modifications to create a map of the most reproducible modification observations onto the complete set of human protein sequences. This set of modification acceptor sites was then directly translated into the genomic coordinates for the codons for the residues at those sites. We constructed the database g2pDB using this protein-to-codon site mapping information. The information in g2pDB has been made available through a RESTful-style API, allowing researchers to determine which specific protein modifications would be perturbed by a set of observed nucleotide variants determined by high throughput DNA or RNA sequencing.
Keywords: REST API; acetylation; genome coordinate; phosphorylation; post-translational modification; protein coordinate; single nucleotide variant; ubiquitination.