Combining machine learning and homology-based approaches to accurately predict subcellular localization in Arabidopsis
- PMID: 20647376
- PMCID: PMC2938157
- DOI: 10.1104/pp.110.156851
Combining machine learning and homology-based approaches to accurately predict subcellular localization in Arabidopsis
Abstract
A complete map of the Arabidopsis (Arabidopsis thaliana) proteome is clearly a major goal for the plant research community in terms of determining the function and regulation of each encoded protein. Developing genome-wide prediction tools such as for localizing gene products at the subcellular level will substantially advance Arabidopsis gene annotation. To this end, we performed a comprehensive study in Arabidopsis and created an integrative support vector machine-based localization predictor called AtSubP (for Arabidopsis subcellular localization predictor) that is based on the combinatorial presence of diverse protein features, such as its amino acid composition, sequence-order effects, terminal information, Position-Specific Scoring Matrix, and similarity search-based Position-Specific Iterated-Basic Local Alignment Search Tool information. When used to predict seven subcellular compartments through a 5-fold cross-validation test, our hybrid-based best classifier achieved an overall sensitivity of 91% with high-confidence precision and Matthews correlation coefficient values of 90.9% and 0.89, respectively. Benchmarking AtSubP on two independent data sets, one from Swiss-Prot and another containing green fluorescent protein- and mass spectrometry-determined proteins, showed a significant improvement in the prediction accuracy of species-specific AtSubP over some widely used "general" tools such as TargetP, LOCtree, PA-SUB, MultiLoc, WoLF PSORT, Plant-PLoc, and our newly created All-Plant method. Cross-comparison of AtSubP on six nontrained eukaryotic organisms (rice [Oryza sativa], soybean [Glycine max], human [Homo sapiens], yeast [Saccharomyces cerevisiae], fruit fly [Drosophila melanogaster], and worm [Caenorhabditis elegans]) revealed inferior predictions. AtSubP significantly outperformed all the prediction tools being currently used for Arabidopsis proteome annotation and, therefore, may serve as a better complement for the plant research community. A supplemental Web site that hosts all the training/testing data sets and whole proteome predictions is available at http://bioinfo3.noble.org/AtSubP/.
Figures
Similar articles
-
RSLpred: an integrative system for predicting subcellular localization of rice proteins combining compositional and evolutionary information.Proteomics. 2009 May;9(9):2324-42. doi: 10.1002/pmic.200700597. Proteomics. 2009. PMID: 19402042
-
Identification of novel plant peroxisomal targeting signals by a combination of machine learning methods and in vivo subcellular targeting analyses.Plant Cell. 2011 Apr;23(4):1556-72. doi: 10.1105/tpc.111.084095. Epub 2011 Apr 12. Plant Cell. 2011. PMID: 21487095 Free PMC article.
-
Hum-PLoc: a novel ensemble classifier for predicting human protein subcellular localization.Biochem Biophys Res Commun. 2006 Aug 18;347(1):150-7. doi: 10.1016/j.bbrc.2006.06.059. Epub 2006 Jun 21. Biochem Biophys Res Commun. 2006. PMID: 16808903
-
Prediction of Peroxisomal Matrix Proteins in Plants.Subcell Biochem. 2018;89:125-138. doi: 10.1007/978-981-13-2233-4_5. Subcell Biochem. 2018. PMID: 30378021 Review.
-
Arabidopsis thaliana proteomics: from proteome to genome.J Exp Bot. 2006;57(7):1485-91. doi: 10.1093/jxb/erj130. Epub 2006 Mar 21. J Exp Bot. 2006. PMID: 16551684 Review.
Cited by
-
Unveiling the defensive role of Snakin-3, a member of the subfamily III of Snakin/GASA peptides in potatoes.Plant Cell Rep. 2024 Feb 1;43(2):47. doi: 10.1007/s00299-023-03108-4. Plant Cell Rep. 2024. PMID: 38302779
-
Genome-Wide Identification and Expression Analysis of the SPL Gene Family in Three Orchids.Int J Mol Sci. 2023 Jun 12;24(12):10039. doi: 10.3390/ijms241210039. Int J Mol Sci. 2023. PMID: 37373185 Free PMC article.
-
An exonuclease V homologue is expressed predominantly during early megasporogenesis in apomictic Brachiaria brizantha.Planta. 2023 May 23;258(1):5. doi: 10.1007/s00425-023-04162-8. Planta. 2023. PMID: 37219749
-
Genome-wide identification and expression analysis of the GRAS gene family in Dendrobium chrysotoxum.Front Plant Sci. 2022 Nov 28;13:1058287. doi: 10.3389/fpls.2022.1058287. eCollection 2022. Front Plant Sci. 2022. PMID: 36518517 Free PMC article.
-
Genome-wide identification of YABBY genes in three Cymbidium species and expression patterns in C. ensifolium (Orchidaceae).Front Plant Sci. 2022 Nov 24;13:995734. doi: 10.3389/fpls.2022.995734. eCollection 2022. Front Plant Sci. 2022. PMID: 36507452 Free PMC article.
References
Publication types
MeSH terms
Substances
LinkOut - more resources
Full Text Sources
