Comparison of phosphorylation patterns across eukaryotes by discriminative N-gram analysis

BMC Bioinformatics. 2015 Jul 30;16(1):239. doi: 10.1186/s12859-015-0657-2.


Background: How protein phosphorylation relates to kingdom/phylum divergence is largely unknown and the amino acid residues surrounding the phosphorylation site have profound importance on protein kinase-substrate interactions. Standard motif analysis is not adequate for large scale comparative analysis because each phophopeptide is assigned to a unique motif and perform poorly with the unbalanced nature of the input datasets.

Results: First the discriminative n-grams of five species from five different kingdom/phyla were identified. A signature with 5540 discriminative n-grams that could be found in other species from the same kingdoms/phyla was created. Using a test data set, the ability of the signature to classify species in their corresponding kingdom/phylum was confirmed using classification methods. Lastly, ortholog proteins among proteins with n-grams were identified in order to determine to what degree was the identity of the detected n-grams a property of phosphosites rather than a consequence of species-specific or kingdom/phylum-specific protein inventory. The motifs were grouped in clusters of equal physico-chemical nature and their distribution was similar between species in the same kingdom/phylum while clear differences were found among species of different kingdom/phylum. For example, the animal-specific top discriminative n-grams contained many basic amino acids and the plant-specific motifs were mainly acidic. Secondary structure prediction methods show that the discriminative n-grams in the majority of the cases lack from a regular secondary structure as on average they had 88% of random coil compared to 66% found in the phosphoproteins they were derived from.

Conclusions: The discriminative n-grams were able to classify organisms in their corresponding kingdom/phylum, they show different patterns among species of different kingdom/phylum and these regions can contribute to evolutionary divergence as they are in disordered regions that can evolve rapidly. The differences found possibly reflect group-specific differences in the kinomes of the different groups of species.

Publication types

  • Comparative Study
  • Research Support, Non-U.S. Gov't

MeSH terms

  • Algorithms*
  • Amino Acid Motifs
  • Animals
  • Arabidopsis / metabolism
  • Cluster Analysis
  • Discriminant Analysis
  • Eukaryota / metabolism*
  • Evolution, Molecular
  • Humans
  • Phosphopeptides / analysis
  • Phosphopeptides / chemistry
  • Phosphorylation
  • Phytophthora / metabolism
  • Protein Structure, Secondary
  • Proteins / metabolism*
  • Proteomics / methods*
  • Saccharomyces cerevisiae / metabolism


  • Phosphopeptides
  • Proteins