Facing Imbalanced Data Recommendations for the Use of Performance Metrics
- PMID: 25574450
- PMCID: PMC4285355
- DOI: 10.1109/ACII.2013.47
Facing Imbalanced Data Recommendations for the Use of Performance Metrics
Abstract
Recognizing facial action units (AUs) is important for situation analysis and automated video annotation. Previous work has emphasized face tracking and registration and the choice of features classifiers. Relatively neglected is the effect of imbalanced data for action unit detection. While the machine learning community has become aware of the problem of skewed data for training classifiers, little attention has been paid to how skew may bias performance metrics. To address this question, we conducted experiments using both simulated classifiers and three major databases that differ in size, type of FACS coding, and degree of skew. We evaluated influence of skew on both threshold metrics (Accuracy, F-score, Cohen's kappa, and Krippendorf's alpha) and rank metrics (area under the receiver operating characteristic (ROC) curve and precision-recall curve). With exception of area under the ROC curve, all were attenuated by skewed distributions, in many cases, dramatically so. While ROC was unaffected by skew, precision-recall curves suggest that ROC may mask poor performance. Our findings suggest that skew is a critical factor in evaluating performance metrics. To avoid or minimize skew-biased estimates of performance, we recommend reporting skew-normalized scores along with the obtained ones.
Figures
Similar articles
-
Tuning model parameters in class-imbalanced learning with precision-recall curve.Biom J. 2019 May;61(3):652-664. doi: 10.1002/bimj.201800148. Epub 2018 Dec 12. Biom J. 2019. PMID: 30548291
-
Comparing Multiple Machine Learning Algorithms and Metrics for Estrogen Receptor Binding Prediction.Mol Pharm. 2018 Oct 1;15(10):4361-4370. doi: 10.1021/acs.molpharmaceut.8b00546. Epub 2018 Aug 28. Mol Pharm. 2018. PMID: 30114914 Free PMC article.
-
The precision-recall plot is more informative than the ROC plot when evaluating binary classifiers on imbalanced datasets.PLoS One. 2015 Mar 4;10(3):e0118432. doi: 10.1371/journal.pone.0118432. eCollection 2015. PLoS One. 2015. PMID: 25738806 Free PMC article.
-
Receiver operating characteristic (ROC) curves: review of methods with applications in diagnostic medicine.Phys Med Biol. 2018 Mar 29;63(7):07TR01. doi: 10.1088/1361-6560/aab4b1. Phys Med Biol. 2018. PMID: 29512515 Review.
-
How to evaluate an agent's behavior to infrequent events?-Reliable performance estimation insensitive to class distribution.Front Comput Neurosci. 2014 Apr 10;8:43. doi: 10.3389/fncom.2014.00043. eCollection 2014. Front Comput Neurosci. 2014. PMID: 24782751 Free PMC article. Review.
Cited by
-
MLcps: machine learning cumulative performance score for classification problems.Gigascience. 2022 Dec 28;12:giad108. doi: 10.1093/gigascience/giad108. Gigascience. 2022. PMID: 38091508 Free PMC article.
-
Integrative approaches based on genomic techniques in the functional studies on enhancers.Brief Bioinform. 2023 Nov 22;25(1):bbad442. doi: 10.1093/bib/bbad442. Brief Bioinform. 2023. PMID: 38048082 Free PMC article. Review.
-
Evaluating changes in firefighter urinary metabolomes after structural fires: an untargeted, high resolution approach.Sci Rep. 2023 Nov 27;13(1):20872. doi: 10.1038/s41598-023-47799-x. Sci Rep. 2023. PMID: 38012297 Free PMC article.
-
Application of 1D ResNet for Multivariate Fault Detection on Semiconductor Manufacturing Equipment.Sensors (Basel). 2023 Nov 10;23(22):9099. doi: 10.3390/s23229099. Sensors (Basel). 2023. PMID: 38005487 Free PMC article.
-
Unveiling Adolescent Suicidality: Holistic Analysis of Protective and Risk Factors Using Multiple Machine Learning Algorithms.J Youth Adolesc. 2024 Mar;53(3):507-525. doi: 10.1007/s10964-023-01892-6. Epub 2023 Nov 20. J Youth Adolesc. 2024. PMID: 37982927 Free PMC article.
References
-
- Abe S. Support vector machines for pattern classification. Springer; 2010. 3.
-
- Akbani R, Kwek S, Japkowicz N. Machine Learning: ECML 2004. Springer; Berlin Heidelberg: 2004. Applying support vector machines to imbalanced datasets. pp. 39–50. 1.
-
- Chawla NV, Japkowicz N, Kotcz A. Editorial: special issue on learning from imbalanced data sets. SIGKDD Explor. Newsl. 2004 Jun;6(1):1–6. (2004) 1.
-
- Chew SW, Lucey PJ, Lucey S, Saragih J, Sridharan J. F. Cohn S. Person-independent facial expression detection using constrained local models.. Proceedings of FG 2011 Facial Expression Recognition and Analysis Challenge; Santa Barbara, CA. 2011; 5.
Grants and funding
LinkOut - more resources
Full Text Sources
Other Literature Sources