Skip to main page content
Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
, 21 (1), 6

The Advantages of the Matthews Correlation Coefficient (MCC) Over F1 Score and Accuracy in Binary Classification Evaluation


The Advantages of the Matthews Correlation Coefficient (MCC) Over F1 Score and Accuracy in Binary Classification Evaluation

Davide Chicco et al. BMC Genomics.


Background: To evaluate binary classifications and their confusion matrices, scientific researchers can employ several statistical rates, accordingly to the goal of the experiment they are investigating. Despite being a crucial issue in machine learning, no widespread consensus has been reached on a unified elective chosen measure yet. Accuracy and F1 score computed on confusion matrices have been (and still are) among the most popular adopted metrics in binary classification tasks. However, these statistical measures can dangerously show overoptimistic inflated results, especially on imbalanced datasets.

Results: The Matthews correlation coefficient (MCC), instead, is a more reliable statistical rate which produces a high score only if the prediction obtained good results in all of the four confusion matrix categories (true positives, false negatives, true negatives, and false positives), proportionally both to the size of positive elements and the size of negative elements in the dataset.

Conclusions: In this article, we show how MCC produces a more informative and truthful score in evaluating binary classifications than accuracy and F1 score, by first explaining the mathematical properties, and then the asset of MCC in six synthetic use cases and in a real genomics scenario. We believe that the Matthews correlation coefficient should be preferred to accuracy and F1 score in evaluating binary classification tasks by all scientific communities.

Keywords: Accuracy; Binary classification; Biostatistics; Confusion matrices; Dataset imbalance; F1 score; Genomics; Machine learning; Matthews correlation coefficient.

Conflict of interest statement

The authors declare they have no competing interests.


Fig. 1
Fig. 1
Relationship between MCC and F1 score. Scatterplot of all the 21 084 251 possible confusion matrices for a dataset with 500 samples on the MCC/ F1 plane. In red, the (−0.04, 0.95) point corresponding to use case A1
Fig. 2
Fig. 2
Use case A1 — Positively imbalanced dataset. a Barplot representing accuracy, F1, and normalized Matthews correlation coefficient (normMCC = (MCC + 1) / 2), all in the [0, 1] interval, where 0 is the worst possible score and 1 is the best possible score, applied to the Use case A1 positively imbalanced dataset. b Pie chart representing the amounts of true positives (TP), false negatives (FN), true negatives (TN), and false positives (FP). c Pie chart representing the dataset balance, as the amounts of positive data instances and negative data instances

Similar articles

See all similar articles

Cited by 1 PubMed Central articles


    1. Chicco D, Rovelli C. Computational prediction of diagnosis and feature selection on mesothelioma patient health records. PLoS ONE. 2019;14(1):0208737. doi: 10.1371/journal.pone.0208737. - DOI - PMC - PubMed
    1. Fernandes K, Chicco D, Cardoso JS, Fernandes J. Supervised deep learning embeddings for the prediction of cervical cancer diagnosis. PeerJ Comput Sci. 2018;4:154. doi: 10.7717/peerj-cs.154. - DOI
    1. Maggio V, Chierici M, Jurman G, Furlanello C. Distillation of the clinical algorithm improves prognosis by multi-task deep learning in high-risk neuroblastoma. PLoS ONE. 2018;13(12):0208924. doi: 10.1371/journal.pone.0208924. - DOI - PMC - PubMed
    1. Fioravanti D, Giarratano Y, Maggio V, Agostinelli C, Chierici M, Jurman G, Furlanello C. Phylogenetic convolutional neural networks in metagenomics. BMC Bioinformatics. 2018;19(2):49. doi: 10.1186/s12859-018-2033-5. - DOI - PMC - PubMed
    1. LeCun Y, Bengio Y, Hinton G. Deep learning. Nature. 2015;521(7553):436. doi: 10.1038/nature14539. - DOI - PubMed

LinkOut - more resources