Background: Protein post-translational modifications (PTMs) are an important aspect of protein regulation. The number of PTMs discovered within the human proteome, and other proteomes, has been rapidly expanding in recent years. As a consequence of the rate in which new PTMs are identified, analysis done in one year may result in different conclusions when repeated in subsequent years. Among the various functional questions pertaining to PTMs, one important relationship to address is the interplay between modifications and mutations. Specifically, because the linear sequence surrounding a modification site often determines molecular recognition, it is hypothesized that mutations near sites of PTMs may be more likely to result in a detrimental effect on protein function, resulting in the development of disease.
Methods and results: We wrote an application programming interface (API) to make analysis of ProteomeScout, a comprehensive database of PTMs and protein information, easy and reproducible. We used this API to analyze the relationship between PTMs and human mutations associated with disease (based on the 'Clinical Significance' annotation from dbSNP). Proteins containing pathogenic mutations demonstrated a significant study bias which was controlled for by analyzing only well-studied proteins, based on their having at least one pathogenic mutation. We found that pathogenic mutations are significantly more likely to lie within eight amino acids of a phosphoserine, phosphotyrosine or ubiquitination site when compared to mutations in general, based on a Fisher's Exact test. Despite the skew of pathogenic mutations occurring on positively charged arginines, we could not account for this relationship based only on residue type. Finally, we hypothesize a potential mechanism for a pathogenic mutation on RAF1, based on its proximity to a phosphorylation site, which represents a subtle regulation difference that may explain why its biochemical effect has failed to be uncovered previously. The combination of the API and a dynamically expanding PTM database will make the reanalysis of this question and other systems-level questions easier in the future.