An efficient approach for textual data classification using deep learning

Abdullah Alqahtani; Habib Ullah Khan; Shtwai Alsubai; Mohemmed Sha; Ahmad Almadhor; Tayyab Iqbal; Sidra Abbas

doi:10.3389/fncom.2022.992296

An efficient approach for textual data classification using deep learning

Front Comput Neurosci. 2022 Sep 15:16:992296. doi: 10.3389/fncom.2022.992296. eCollection 2022.

Authors

Abdullah Alqahtani¹, Habib Ullah Khan², Shtwai Alsubai¹, Mohemmed Sha¹, Ahmad Almadhor³, Tayyab Iqbal⁴, Sidra Abbas⁵

Affiliations

¹ College of Computer Engineering and Sciences, Prince Sattam Bin Abdulaziz University, Al-Kharj, Saudi Arabia.
² Department of Accounting and Information Systems, College of Business and Economics, Qatar University, Doha, Qatar.
³ College of Computer and Information Sciences, Jouf University, Al-Kharj, Saudi Arabia.
⁴ Department of Computer Science, FAST-NUCES, Islamabad, Pakistan.
⁵ Department of Computer Science, COMSATS University, Islamabad, Pakistan.

Abstract

Text categorization is an effective activity that can be accomplished using a variety of classification algorithms. In machine learning, the classifier is built by learning the features of categories from a set of preset training data. Similarly, deep learning offers enormous benefits for text classification since they execute highly accurately with lower-level engineering and processing. This paper employs machine and deep learning techniques to classify textual data. Textual data contains much useless information that must be pre-processed. We clean the data, impute missing values, and eliminate the repeated columns. Next, we employ machine learning algorithms: logistic regression, random forest, K-nearest neighbors (KNN), and deep learning algorithms: long short-term memory (LSTM), artificial neural network (ANN), and gated recurrent unit (GRU) for classification. Results reveal that LSTM achieves 92% accuracy outperforming all other model and baseline studies.

Keywords: deep learning; machine learning; text categorization; text classification; text data.