Background: Type III secretion system (T3SS) is a specialized protein delivery system in gram-negative bacteria that injects proteins (called effectors) directly into the eukaryotic host cytosol and facilitates bacterial infection. For many plant and animal pathogens, T3SS is indispensable for disease development. Recently, T3SS has also been found in rhizobia and plays a crucial role in the nodulation process. Although a great deal of efforts have been done to understand type III secretion, the precise mechanism underlying the secretion and translocation process has not been fully understood. In particular, defined secretion and translocation signals enabling the secretion have not been identified from the type III secreted effectors (T3SEs), which makes the identification of these important virulence factors notoriously challenging. The availability of a large number of sequenced genomes for plant and animal-associated bacteria demands the development of efficient and effective prediction methods for the identification of T3SEs using bioinformatics approaches.
Results: We have developed a machine learning method based on the N-terminal amino acid sequences to predict novel type III effectors in the plant pathogen Pseudomonas syringae and the microsymbiont rhizobia. The extracted features used in the learning model (or classifier) include amino acid composition, secondary structure and solvent accessibility information. The method achieved a precision of over 90% on P. syringae in a cross validation study. In combination with a promoter screen for the type III specific promoters, this classifier trained on the P. syringae data was applied to predict novel T3SEs from the genomic sequences of four rhizobial strains. This application resulted in 57 candidate type III secreted proteins, 17 of which are confirmed effectors.
Conclusion: Our experimental results demonstrate that the machine learning method based on N-terminal amino acid sequences combined with a promoter screen could prove to be a very effective computational approach for predicting novel type III effectors in gram-negative bacteria. Our method and data are available to the public upon request.