Skip to main page content
Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
. 2018 Sep;19(9):2094-2110.
doi: 10.1111/mpp.12682. Epub 2018 May 11.

Improved Prediction of Fungal Effector Proteins From Secretomes With EffectorP 2.0

Affiliations
Free PMC article

Improved Prediction of Fungal Effector Proteins From Secretomes With EffectorP 2.0

Jana Sperschneider et al. Mol Plant Pathol. .
Free PMC article

Abstract

Plant-pathogenic fungi secrete effector proteins to facilitate infection. We describe extensive improvements to EffectorP, the first machine learning classifier for fungal effector prediction. EffectorP 2.0 is now trained on a larger set of effectors and utilizes a different approach based on an ensemble of classifiers trained on different subsets of negative data, offering different views on classification. EffectorP 2.0 achieves an accuracy of 89%, compared with 82% for EffectorP 1.0 and 59.8% for a small size classifier. Important features for effector prediction appear to be protein size, protein net charge as well as the amino acids serine and cysteine. EffectorP 2.0 decreases the number of predicted effectors in secretomes of fungal plant symbionts and saprophytes by 40% when compared with EffectorP 1.0. However, EffectorP 1.0 retains value, and combining EffectorP 1.0 and 2.0 results in a stringent classifier with a low false positive rate of 9%. EffectorP 2.0 predicts significant enrichments of effectors in 12 of 13 sets of infection-induced proteins from diverse fungal pathogens, whereas a small cysteine-rich classifier detects enrichment in only seven of 13. EffectorP 2.0 will fast track the prioritization of high-confidence effector candidates for functional validation and aid in improving our understanding of effector biology. EffectorP 2.0 is available at http://effectorp.csiro.au.

Keywords: EffectorP; effector; effector prediction; fungal pathogens; machine learning; secretomes.

Figures

Figure 1
Figure 1
Workflow for the EffectorP 2.0 classifier that combines an ensemble of machine learning classifiers. Each classifier Ci has seen a different subset of the negative training data and predicts effectors in unseen data with probability Pi. The probabilities are combined into an overall vote on whether an unseen protein is an effector or non‐effector.
Figure 2
Figure 2
The most influential features in effector prediction appear to be a small protein size, low serine content, a protein net charge around the neutral range and a high cysteine content. Significant differences (P < 0.05) in distribution between effectors and the negative sequence set for additional features were also observed. These were depletion in aliphatic amino acids, leucine (L), proline (P), threonine (T), tryptophan (W), disorder propensity and bulkiness, as well as enrichment in basic amino acids, interface propensity, glycine (G), lysine (K) and asparagine (N), for effectors. Extreme outliers in the protein net charge plot were removed for clarity (full figure given in Fig. S3, see Supporting Information). All data points are drawn on top of the box plots as black dots. Significance between groups is shown as horizontal brackets and was assessed using t‐tests. The lower and upper hinges correspond to the first and third quartiles and the upper (lower) whiskers extend from the hinge to the largest (smallest) value that is within 1.5 times the interquartile range of the hinge. Data beyond the end of the whiskers are outliers.
Figure 3
Figure 3
Proportions of predicted effectors in fungal secretomes using EffectorP 1.0, EffectorP 2.0, a small size classifier and a small, cysteine‐rich classifier. All data points are drawn on top of the box plots as black dots. Significance between groups is shown as horizontal brackets and was assessed using t‐tests (NS, not significant; *P < 0.05, **P < 0.01 and ***P < 0.001). The lower and upper hinges correspond to the first and third quartiles and the upper (lower) whiskers extend from the hinge to the largest (smallest) value that is within 1.5 times the interquartile range of the hinge. Data beyond the end of the whiskers are outliers.
Figure 4
Figure 4
Differences in sequence length (aas, amino acids) and cysteine content for effectors predicted by different versions of EffectorP. All data points are drawn on top of the box plots as black dots. Significance between groups is shown as horizontal brackets and was assessed using t‐tests. The lower and upper hinges correspond to the first and third quartiles and the upper (lower) whiskers extend from the hinge to the largest (smallest) value that is within 1.5 times the interquartile range of the hinge. Data beyond the end of the whiskers are outliers.

Similar articles

See all similar articles

Cited by 23 articles

See all "Cited by" articles

Publication types

MeSH terms

Substances

LinkOut - more resources

Feedback