Skip to main page content
Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
. 2012 Sep 5;3:207.
doi: 10.3389/fpls.2012.00207. eCollection 2012.

Characterization and Prediction of Protein Phosphorylation Hotspots in Arabidopsis Thaliana

Free PMC article

Characterization and Prediction of Protein Phosphorylation Hotspots in Arabidopsis Thaliana

Jan-Ole Christian et al. Front Plant Sci. .
Free PMC article


The regulation of protein function by modulating the surface charge status via sequence-locally enriched phosphorylation sites (P-sites) in so called phosphorylation "hotspots" has gained increased attention in recent years. We set out to identify P-hotspots in the model plant Arabidopsis thaliana. We analyzed the spacing of experimentally detected P-sites within peptide-covered regions along Arabidopsis protein sequences as available from the PhosPhAt database. Confirming earlier reports (Schweiger and Linial, 2010), we found that, indeed, P-sites tend to cluster and that distributions between serine and threonine P-sites to their respected closest next P-site differ significantly from those for tyrosine P-sites. The ability to predict P-hotspots by applying available computational P-site prediction programs that focus on identifying single P-sites was observed to be severely compromised by the inevitable interference of nearby P-sites. We devised a new approach, named HotSPotter, for the prediction of phosphorylation hotspots. HotSPotter is based primarily on local amino acid compositional preferences rather than sequence position-specific motifs and uses support vector machines as the underlying classification engine. HotSPotter correctly identified experimentally determined phosphorylation hotspots in A. thaliana with high accuracy. Applied to the Arabidopsis proteome, HotSPotter-predicted 13,677 candidate P-hotspots in 9,599 proteins corresponding to 7,847 unique genes. Hotspot containing proteins are involved predominantly in signaling processes confirming the surmised modulating role of hotspots in signaling and interaction events. Our study provides new bioinformatics means to identify phosphorylation hotspots and lays the basis for further investigating novel candidate P-hotspots. All phosphorylation hotspot annotations and predictions have been made available as part of the PhosPhAt database at

Keywords: Arabidopsis thaliana; hotspots; protein phosphorylation; regulation; support vector machines.


Figure 1
Figure 1
Frequency distribution of sequence distances between neighboring P-sites between (A) any pST and any other site P-site pSTY, (B) any pY and any other P-site. As for the respective neighboring site, no distinction was made as to what amino acid residue type (either S, T, or Y) was found phosphorylated. (C) Equivalent distributions for P-flag-randomized and sequence-randomized protein sequences averaged over 100 repeat runs for nearest pST and pSTY distances. (D) Similary for pY, pSTY distances. In P-flag, phosphorylation signals were randomly redistributed among the existing serines and tyrosines, whereas in sequence-randomized runs, the entire protein sequence was randomized.
Figure 2
Figure 2
Frequency distribution of distances between closest neighboring predicted P-sites in the Arabidopsis genome. (A) Between any pST and the nearest pSTY, and (B) between any pY and the nearest pSTY. For comparison, results for 100 P-flag randomizations are given by red filled circles. Evidently, the nearest neighbor distance distribution differs from the distribution between experimentally identified sites (Figure 1) with a secondary peak at dN(pST, pSTY) = 4 and a more even distribution of dN(pY, pSTY) for predicted sites.
Figure 3
Figure 3
Screenshot of the PhosPhAt database with P-hotspot annotation information.

Similar articles

See all similar articles

Cited by 10 articles

See all "Cited by" articles


    1. Barford D., Hu S. H., Johnson L. N. (1991). Structural mechanism for glycogen phosphorylase control by phosphorylation and AMP. J. Mol. Biol. 218, 233–26010.1016/0022-2836(91)90887-C - DOI - PubMed
    1. Benjamini Y., Hochberg Y. (1995). Controlling the false discovery rate – a practical and powerful approach to multiple testing. J. R. Stat. Soc. Series B Stat. Methodol. 57, 289–300
    1. Boekhorst J., Van Breukelen B., Heck A., Jr., Snel B. (2008). Comparative phosphoproteomics reveals evolutionary and functional conservation of phosphorylation across eukaryotes. Genome Biol. 9, R144.10.1186/gb-2008-9-10-r144 - DOI - PMC - PubMed
    1. Dunker A. K., Brown C. J., Lawson J. D., Iakoucheva L. M., Obradovic Z. (2002). Intrinsic disorder and protein function. Biochemistry 41, 6573–658210.1021/bi012159+ - DOI - PubMed
    1. Durek P., Schmidt R., Heazlewood J. L., Jones A., Maclean D., Nagel A., Kersten B., Schulze W. X. (2010). PhosPhAt: the Arabidopsis thaliana phosphorylation site database. An update. Nucleic Acids Res. 38, D828–D83410.1093/nar/gkp810 - DOI - PMC - PubMed

LinkOut - more resources