Background: Minimotifs are short contiguous peptide sequences in proteins that are known to have a function in at least one other protein. One of the principal limitations in minimotif prediction is that false positives limit the usefulness of this approach. As a step toward resolving this problem we have built, implemented, and tested a new data-driven algorithm that reduces false-positive predictions.
Methodology/principal findings: Certain domains and minimotifs are known to be strongly associated with a known cellular process or molecular function. Therefore, we hypothesized that by restricting minimotif predictions to those where the minimotif containing protein and target protein have a related cellular or molecular function, the prediction is more likely to be accurate. This filter was implemented in Minimotif Miner using function annotations from the Gene Ontology. We have also combined two filters that are based on entirely different principles and this combined filter has a better predictability than the individual components.
Conclusions/significance: Testing these functional filters on known and random minimotifs has revealed that they are capable of separating true motifs from false positives. In particular, for the cellular function filter, the percentage of known minimotifs that are not removed by the filter is approximately 4.6 times that of random minimotifs. For the molecular function filter this ratio is approximately 2.9. These results, together with the comparison with the published frequency score filter, strongly suggest that the new filters differentiate true motifs from random background with good confidence. A combination of the function filters and the frequency score filter performs better than these two individual filters.