Skip to main page content
Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
. 2009 Jan 1;25(1):14-21.
doi: 10.1093/bioinformatics/btn569. Epub 2008 Nov 7.

Discovery of Phosphorylation Motif Mixtures in Phosphoproteomics Data

Affiliations
Free PMC article

Discovery of Phosphorylation Motif Mixtures in Phosphoproteomics Data

Anna Ritz et al. Bioinformatics. .
Free PMC article

Abstract

Motivation: Modification of proteins via phosphorylation is a primary mechanism for signal transduction in cells. Phosphorylation sites on proteins are determined in part through particular patterns, or motifs, present in the amino acid sequence.

Results: We describe an algorithm that simultaneously discovers multiple motifs in a set of peptides that were phosphorylated by several different kinases. Such sets of peptides are routinely produced in proteomics experiments.Our motif-finding algorithm uses the principle of minimum description length to determine a mixture of sequence motifs that distinguish a foreground set of phosphopeptides from a background set of unphosphorylated peptides. We show that our algorithm outperforms existing motif-finding algorithms on synthetic datasets consisting of mixtures of known phosphorylation sites. We also derive a motif specificity score that quantifies whether or not the phosphoproteins containing an instance of a motif have a significant number of known interactions. Application of our motif-finding algorithm to recently published human and mouse proteomic studies recovers several known phosphorylation motifs and reveals a number of novel motifs that are enriched for interactions with a particular kinase or phosphatase. Our tools provide a new approach for uncovering the sequence specificities of uncharacterized kinases or phosphatases.

Figures

Fig. 1.
Fig. 1.
Overview of the MoDL algorithm and MSS calculation. MoDL uses the description length, a measure of the amount of information (bits) required to describe the input phosphopeptides using a motif set ℳ and the background distribution. MoDL attempts to find the optimal motif set with minimum description length. (A) With an empty motif set (i.e. no motifs), each peptide must be described explicitly from the background distribution, yielding high description length (left column). On the opposite extreme, each phosphopeptide can be described as a unique motif, but the resulting motif set yields high description length (right column). The optimal motif set includes only motifs that match several phosphopeptides, and minimizes the total description length required to represent both the motifs and the phosphopeptides (center column). After the optimal motif set is determined, the individual motifs are ranked according to the increase in description length when a motif is removed from the set. (B) Computing the MSS between a kinase and a motif group, the proteins containing a motif instance. The proteins are colored according to the motif instances they contain at one or more phosphorylation sites, and gray proteins contain no motif instances. To find the MSS for the blue motif D..Y.[SD]P, we consider all proteins in the motif group (blue). A kinase will have a high MSS if the number of interactions between the kinase and the motif group (solid lines) is significantly greater than the number of interactions between the kinase and proteins not in the motif group (dotted lines).
Fig. 2.
Fig. 2.
Performance of MoDL compared with MEME's ROC curves. Filled points on each curve represent the values returned with the default parameter settings of MEME (see Supplementary Material). (A) On a synthetic dataset with 10 instances each of D..YE and [IL]Y….PP, MoDL maintains constant true positive rate and only a slight decrease in false positive rate as the number of background (non-motif) sequences increases. In contrast, the performance of MEME varies drastically. (B) Representative examples comparing MoDL's performance to MEME and Motif-X on Scansite motifs. On the left, MoDL clearly outperforms both Motif-X and MEME. In the center, MoDL outperforms Motif-X and occupies a higher true positive rate on the MEME ROC curve than MEME's default settings. On the right, MEME outperforms both Motif-X and MoDL.

Similar articles

See all similar articles

Cited by 13 articles

See all "Cited by" articles

Publication types

Feedback