Hate speech detection and racial bias mitigation in social media based on BERT model
- PMID: 32853205
- PMCID: PMC7451563
- DOI: 10.1371/journal.pone.0237861
Hate speech detection and racial bias mitigation in social media based on BERT model
Abstract
Disparate biases associated with datasets and trained classifiers in hateful and abusive content identification tasks have raised many concerns recently. Although the problem of biased datasets on abusive language detection has been addressed more frequently, biases arising from trained classifiers have not yet been a matter of concern. In this paper, we first introduce a transfer learning approach for hate speech detection based on an existing pre-trained language model called BERT (Bidirectional Encoder Representations from Transformers) and evaluate the proposed model on two publicly available datasets that have been annotated for racism, sexism, hate or offensive content on Twitter. Next, we introduce a bias alleviation mechanism to mitigate the effect of bias in training set during the fine-tuning of our pre-trained BERT-based model for hate speech detection. Toward that end, we use an existing regularization method to reweight input samples, thereby decreasing the effects of high correlated training set' s n-grams with class labels, and then fine-tune our pre-trained BERT-based model with the new re-weighted samples. To evaluate our bias alleviation mechanism, we employed a cross-domain approach in which we use the trained classifiers on the aforementioned datasets to predict the labels of two new datasets from Twitter, AAE-aligned and White-aligned groups, which indicate tweets written in African-American English (AAE) and Standard American English (SAE), respectively. The results show the existence of systematic racial bias in trained classifiers, as they tend to assign tweets written in AAE from AAE-aligned group to negative classes such as racism, sexism, hate, and offensive more often than tweets written in SAE from White-aligned group. However, the racial bias in our classifiers reduces significantly after our bias alleviation mechanism is incorporated. This work could institute the first step towards debiasing hate speech and abusive language detection systems.
Conflict of interest statement
The authors have declared that no competing interests exist.
Figures
Similar articles
-
Roman Urdu Hate Speech Detection Using Transformer-Based Model for Cyber Security Applications.Sensors (Basel). 2023 Apr 12;23(8):3909. doi: 10.3390/s23083909. Sensors (Basel). 2023. PMID: 37112249 Free PMC article.
-
Detection of Hate Speech in COVID-19-Related Tweets in the Arab Region: Deep Learning and Topic Modeling Approach.J Med Internet Res. 2020 Dec 8;22(12):e22609. doi: 10.2196/22609. J Med Internet Res. 2020. PMID: 33207310 Free PMC article.
-
Emotionally Informed Hate Speech Detection: A Multi-target Perspective.Cognit Comput. 2022;14(1):322-352. doi: 10.1007/s12559-021-09862-5. Epub 2021 Jun 28. Cognit Comput. 2022. PMID: 34221180 Free PMC article.
-
Online interventions for reducing hate speech and cyberhate: A systematic review.Campbell Syst Rev. 2022 May 25;18(2):e1243. doi: 10.1002/cl2.1243. eCollection 2022 Jun. Campbell Syst Rev. 2022. PMID: 36913206 Free PMC article. Review.
-
Hate speech and abusive language detection in Indonesian social media: Progress and challenges.Heliyon. 2023 Jul 28;9(8):e18647. doi: 10.1016/j.heliyon.2023.e18647. eCollection 2023 Aug. Heliyon. 2023. PMID: 37636475 Free PMC article. Review.
Cited by
-
Roman Urdu Hate Speech Detection Using Transformer-Based Model for Cyber Security Applications.Sensors (Basel). 2023 Apr 12;23(8):3909. doi: 10.3390/s23083909. Sensors (Basel). 2023. PMID: 37112249 Free PMC article.
-
Brain Structure and Function gets serious about ethical science writing.Brain Struct Funct. 2023 May;228(3-4):699-701. doi: 10.1007/s00429-023-02645-8. Brain Struct Funct. 2023. PMID: 37093303 No abstract available.
-
Mining of Movie Box Office and Movie Review Topics Using Social Network Big Data.Front Psychol. 2022 May 26;13:903380. doi: 10.3389/fpsyg.2022.903380. eCollection 2022. Front Psychol. 2022. PMID: 35693503 Free PMC article.
-
Weight Stigma and Social Media: Evidence and Public Health Solutions.Front Nutr. 2021 Nov 12;8:739056. doi: 10.3389/fnut.2021.739056. eCollection 2021. Front Nutr. 2021. PMID: 34869519 Free PMC article. Review.
-
Deep Learning for Identification of Alcohol-Related Content on Social Media (Reddit and Twitter): Exploratory Analysis of Alcohol-Related Outcomes.J Med Internet Res. 2021 Sep 15;23(9):e27314. doi: 10.2196/27314. J Med Internet Res. 2021. PMID: 34524095 Free PMC article.
References
-
- Pete B, L WM. Cyber Hate Speech on Twitter: An Application of Machine Classification and Statistical Modeling for Policy and Decision Making. Policy and Internet. 2015;7(2):223–242. 10.1002/poi3.85 - DOI
-
- Waseem Z, Hovy D. Hateful Symbols or Hateful People? Predictive Features for Hate Speech Detection on Twitter. In: Proceedings of the NAACL Student Research Workshop. San Diego, California: Association for Computational Linguistics; 2016. p. 88–93.
-
- Nobata C, Tetreault JR, Thomas AO, Mehdad Y, Chang Y. Abusive Language Detection in Online User Content. In: Proceedings of the 25th International Conference on World Wide Web. WWW’16. Republic and Canton of Geneva, Switzerland: International World Wide Web Conferences Steering Committee; 2016. p. 145–153.
-
- Davidson T, Warmsley D, Macy MW, Weber I. Automated Hate Speech Detection and the Problem of Offensive Language. CoRR. 2017;abs/1703.04009. Available from: http://arxiv.org/abs/1703.04009.
-
- Mozafari M, Farahbakhsh R, Crespi N. Content Similarity Analysis of Written Comments under Posts in Social Media. In: SNAMS 2019: 6th International Conference on Social Networks Analysis, Management and Security. Grenade, Spain; Oct 2019. p. 158–165. Available from: 10.1109/SNAMS.2019.8931726. - DOI
MeSH terms
Grants and funding
LinkOut - more resources
Full Text Sources
