OCm7G: An interpretable one-class predictor for m7G methylation sites trained with limit negative samples

Genomics. 2025 Dec 11;118(1):111173. doi: 10.1016/j.ygeno.2025.111173. Online ahead of print.

Abstract

N7-methylguanosine (m7G) is a common RNA modification linked to multiple diseases. Accurate detection of m7G sites is vital for elucidating its biological roles, but conventional methods are often laborious and costly. AI-based approaches offer alternatives, yet most rely on balanced datasets, ignoring real-world imbalances where negative samples vastly outnumber positives. This discrepancy may lead to overestimated model performance. To address this, we reconstructed independent test sets with low positive-to-negative ratios and benchmarked various models. We propose OCm7G, an ensemble of one-class classifiers with hierarchical thresholding. OCm7G achieves performance comparable to state-of-the-art methods on balanced sets and surpasses them in highly imbalanced scenarios, despite using only 52.5 % of the training data. Moreover, OCm7G offers interpretable predictions, aiding researchers in understanding model decisions. The source code and datasets are publicly available at: https://github.com/lidaosheng/OCm7G.

Keywords: Artificial intelligence (AI); Imbalanced data; N7-methylguanosine (m7G); One-class classifiers; RNA epigenetic modification.