As the most studied post-translational modification, protein phosphorylation is analyzed in a growing number of proteomic experiments. These high-throughput approaches generate large datasets, from which specific spectrum-based information can be hard to find. In 2007, the PhosPhAt database was launched to collect and present Arabidopsis phosphorylation sites identified by mass spectrometry from and for the scientific community. At present, PhosPhAt 3.0 consolidates phosphoproteomics data from 19 published proteomic studies. Out of 5460 listed unique phosphoproteins, about 25% have been identified in at least two independent experimental setups. This is especially important when considering issues of false positive and false negative identification rates and data quality (Durek etal., 2010). This valuable data set encompasses over 13205 unique phosphopeptides, with unambiguous mapping to serine (77%), threonine (17%), and tyrosine (6%). Sorting the functional annotations of experimentally found phosphorylated proteins in PhosPhAt using Gene Ontology terms shows an over-representation of proteins in regulatory pathways and signaling processes. A similar distribution is found when the PhosPhAt predictor, trained on experimentally obtained plant phosphorylation sites, is used to predict phosphorylation sites for the Arabidopsis genome. Finally, the possibility to insert a protein sequence into the PhosPhAt predictor allows species independent use of the prediction resource. In practice, PhosPhAt also allows easy exploitation of proteomic data for design of further targeted experiments.
Keywords: Arabidopsis; PhosPhAt; database; protein phosphorylation; proteomics.