Class imbalance remains a critical challenge in machine learning, as it often leads to biased predictions where algorithms disproportionately favor the majority class, resulting in the misclassification of minority class instances and reduced overall model performance. This study explores an innovative approach to addressing class imbalance in Random Forests by combining pruning with resampling techniques. While pruning typically improves performance and reduces computational costs, its effectiveness can be limited in complex ensembles dealing with imbalanced data. To tackle this, the proposed method incorporates three resampling strategies: under-sampling the majority class, over-sampling the minority class, and a hybrid of both. After balancing the training data, multiple trees are grown from bootstrap samples, and only those with low out-of-bag error rates are selected for the final ensemble. The classification performance of the proposed method is evaluated and compared against standard algorithms including k-Nearest Neighbors (k-NN), Tree, Random Forest (RF), Balanced Random Forest (BRF), and Support Vector Machine (SVM). The results demonstrate that the proposed method outperformed its competitors in most of the cases.
Supplementary Information: The online version contains supplementary material available at 10.1038/s41598-026-38320-1.
Keywords: Bootstrap; Class imbalance; Classification; Over-sampling; Pruning; Random forest; Trees; Under-sampling.