Background: Many gram-negative bacteria use type III secretion systems (T3SSs) to translocate effector proteins into host cells. T3SS effectors can give some bacteria a competitive edge over others within the same environment and can help bacteria to invade the host cells and allow them to multiply rapidly within the host. Therefore, developing efficient methods to identify effectors scattered in bacterial genomes can lead to a better understanding of host-pathogen interactions and ultimately to important medical and biotechnological applications.
Results: We used 21 genomic and proteomic attributes to create a precise and reliable T3SS effector prediction method called Genome Search for Effectors Tool (GenSET). Five machine learning algorithms were trained on effectors selected from different organisms and a trained (voting) algorithm was then applied to identify other effectors present in the genome testing sets from the same (GenSET Phase 1) or different (GenSET Phase 2) organism. Although a select group of attributes that included the codon adaptation index, probability of expression in inclusion bodies, N-terminal disorder, and G + C content (filtered) were better at discriminating between positive and negative sets, algorithm performance was better when all 21 attributes (unfiltered) were used. Performance scores (sensitivity, specificity and area under the curve) from GenSET Phase 1 were better than those reported for six published methods. More importantly, GenSET Phase 1 ranked more known effectors (70.3%) in the top 40 ranked proteins and predicted 10-80% more effectors than three available programs in three of the four organisms tested. GenSET Phase 2 predicted 43.8% effectors in the top 40 ranked proteins when tested on four related or unrelated organisms. The lower prediction rates from GenSET Phase 2 may be due to the presence of different translocation signals in effectors from different T3SS families.
Conclusions: The species-specific GenSET Phase 1 method offers an alternative approach to T3SS effector prediction that can be used with other published programs to improve effector predictions. Additionally, our approach can be applied to predict effectors of other secretion systems as long as these effectors have translocation signals embedded in their sequences.
Keywords: Effector prediction; Gram-negative bacteria; Machine learning; Type III secretion system.