Scalable Learning Framework for Detecting New Types of Twitter Spam with Misuse and Anomaly Detection

Jaeun Choi; Byunghwan Jeon; Chunmi Jeon

doi:10.3390/s24072263

Scalable Learning Framework for Detecting New Types of Twitter Spam with Misuse and Anomaly Detection

Sensors (Basel). 2024 Apr 2;24(7):2263. doi: 10.3390/s24072263.

Authors

Jaeun Choi¹, Byunghwan Jeon², Chunmi Jeon³

Affiliations

¹ College of Business, Kwangwoon University, Seoul 01897, Republic of Korea.
² Division of Computer Engineering, Hankuk University of Foreign Studies, Yongin 17035, Republic of Korea.
³ Corporate Relations Office, Korea Telecom, Seoul 03155, Republic of Korea.

Abstract

The growing popularity of social media has engendered the social problem of spam proliferation through this medium. New spam types that evade existing spam detection systems are being developed continually, necessitating corresponding countermeasures. This study proposes an anomaly detection-based framework to detect new Twitter spam, which works by modeling the characteristics of non-spam tweets and using anomaly detection to classify tweets deviating from this model as anomalies. However, because modeling varied non-spam tweets is challenging, the technique's spam detection and false positive (FP) rates are low and high, respectively. To overcome this shortcoming, anomaly detection is performed on known spam tweets pre-detected using a trained decision tree while modeling normal tweets. A one-class support vector machine and an autoencoder with high detection rates are used for anomaly detection. The proposed framework exhibits superior detection rates for unknown spam compared to conventional techniques, while maintaining equivalent or improved detection and FP rates for known spam. Furthermore, the framework can be adapted to changes in spam conditions by adjusting the costs of detection errors.

Keywords: Twitter spam; anomaly detection; autoencoder; decision tree.

Abstract

Grants and funding