Exploring sequence-based features for the improved prediction of DNA N4-methylcytosine sites in multiple species
- PMID: 30239627
- DOI: 10.1093/bioinformatics/bty824
Exploring sequence-based features for the improved prediction of DNA N4-methylcytosine sites in multiple species
Abstract
Motivation: As one of important epigenetic modifications, DNA N4-methylcytosine (4mC) is recently shown to play crucial roles in restriction-modification systems. For better understanding of their functional mechanisms, it is fundamentally important to identify 4mC modification. Machine learning methods have recently emerged as an effective and efficient approach for the high-throughput identification of 4mC sites, although high predictive error rates are still challenging for existing methods. Therefore, it is highly desirable to develop a computational method to more accurately identify m4C sites.
Results: In this study, we propose a machine learning based predictor, namely 4mcPred-SVM, for the genome-wide detection of DNA 4mC sites. In this predictor, we present a new feature representation algorithm that sufficiently exploits sequence-based information. To improve the feature representation ability, we use a two-step feature optimization strategy, thereby obtaining the most representative features. Using the resulting features and Support Vector Machine (SVM), we adaptively train the optimal models for different species. Comparative results on benchmark datasets from six species indicate that our predictor is able to achieve generally better performance in predicting 4mC sites as compared to the state-of-the-art predictors. Importantly, the sequence-based features can reliably and robust predict 4mC sites, facilitating the discovery of potentially important sequence characteristics for the prediction of 4mC sites.
Availability and implementation: The user-friendly webserver that implements the proposed 4mcPred-SVM is well established, and is freely accessible at http://server.malab.cn/4mcPred-SVM.
Supplementary information: Supplementary data are available at Bioinformatics online.
© The Author(s) 2018. Published by Oxford University Press. All rights reserved. For permissions, please e-mail: journals.permissions@oup.com.
Similar articles
-
Iterative feature representations improve N4-methylcytosine site prediction.Bioinformatics. 2019 Dec 1;35(23):4930-4937. doi: 10.1093/bioinformatics/btz408. Bioinformatics. 2019. PMID: 31099381
-
Meta-4mCpred: A Sequence-Based Meta-Predictor for Accurate DNA 4mC Site Prediction Using Effective Feature Representation.Mol Ther Nucleic Acids. 2019 Jun 7;16:733-744. doi: 10.1016/j.omtn.2019.04.019. Epub 2019 Apr 30. Mol Ther Nucleic Acids. 2019. PMID: 31146255 Free PMC article.
-
4mCPred: machine learning methods for DNA N4-methylcytosine sites prediction.Bioinformatics. 2019 Feb 15;35(4):593-601. doi: 10.1093/bioinformatics/bty668. Bioinformatics. 2019. PMID: 30052767
-
Large-scale comparative review and assessment of computational methods for anti-cancer peptide identification.Brief Bioinform. 2021 Jul 20;22(4):bbaa312. doi: 10.1093/bib/bbaa312. Brief Bioinform. 2021. PMID: 33316035 Free PMC article. Review.
-
Prediction of bio-sequence modifications and the associations with diseases.Brief Funct Genomics. 2021 Mar 2;20(1):1-18. doi: 10.1093/bfgp/elaa023. Brief Funct Genomics. 2021. PMID: 33313647 Review.
Cited by
-
EMDL-ac4C: identifying N4-acetylcytidine based on ensemble two-branch residual connection DenseNet and attention.Front Genet. 2023 Jul 13;14:1232038. doi: 10.3389/fgene.2023.1232038. eCollection 2023. Front Genet. 2023. PMID: 37519885 Free PMC article.
-
i4mC-GRU: Identifying DNA N4-Methylcytosine sites in mouse genomes using bidirectional gated recurrent unit and sequence-embedded features.Comput Struct Biotechnol J. 2023 May 16;21:3045-3053. doi: 10.1016/j.csbj.2023.05.014. eCollection 2023. Comput Struct Biotechnol J. 2023. PMID: 37273848 Free PMC article.
-
DRSN4mCPred: accurately predicting sites of DNA N4-methylcytosine using deep residual shrinkage network for diagnosis and treatment of gastrointestinal cancer in the precision medicine era.Front Med (Lausanne). 2023 May 4;10:1187430. doi: 10.3389/fmed.2023.1187430. eCollection 2023. Front Med (Lausanne). 2023. PMID: 37215722 Free PMC article.
-
4acCPred: Weakly supervised prediction of N 4-acetyldeoxycytosine DNA modification from sequences.Mol Ther Nucleic Acids. 2022 Oct 14;30:337-345. doi: 10.1016/j.omtn.2022.10.004. eCollection 2022 Dec 13. Mol Ther Nucleic Acids. 2022. PMID: 36381577 Free PMC article.
-
Hyb4mC: a hybrid DNA2vec-based model for DNA N4-methylcytosine sites prediction.BMC Bioinformatics. 2022 Jun 29;23(1):258. doi: 10.1186/s12859-022-04789-6. BMC Bioinformatics. 2022. PMID: 35768759 Free PMC article.
Publication types
MeSH terms
Substances
LinkOut - more resources
Full Text Sources
Other Literature Sources
Research Materials
