Using Topic Modeling Methods for Short-Text Data: A Comparative Analysis
- PMID: 33733159
- PMCID: PMC7861298
- DOI: 10.3389/frai.2020.00042
Using Topic Modeling Methods for Short-Text Data: A Comparative Analysis
Abstract
With the growth of online social network platforms and applications, large amounts of textual user-generated content are created daily in the form of comments, reviews, and short-text messages. As a result, users often find it challenging to discover useful information or more on the topic being discussed from such content. Machine learning and natural language processing algorithms are used to analyze the massive amount of textual social media data available online, including topic modeling techniques that have gained popularity in recent years. This paper investigates the topic modeling subject and its common application areas, methods, and tools. Also, we examine and compare five frequently used topic modeling methods, as applied to short textual social data, to show their benefits practically in detecting important topics. These methods are latent semantic analysis, latent Dirichlet allocation, non-negative matrix factorization, random projection, and principal component analysis. Two textual datasets were selected to evaluate the performance of included topic modeling methods based on the topic quality and some standard statistical evaluation metrics, like recall, precision, F-score, and topic coherence. As a result, latent Dirichlet allocation and non-negative matrix factorization methods delivered more meaningful extracted topics and obtained good results. The paper sheds light on some common topic modeling methods in a short-text context and provides direction for researchers who seek to apply these methods.
Keywords: natural language processing; online social networks; short text; topic modeling; user-generated content.
Copyright © 2020 Albalawi, Yeap and Benyoucef.
Figures
Similar articles
-
Evaluation of clustering and topic modeling methods over health-related tweets and emails.Artif Intell Med. 2021 Jul;117:102096. doi: 10.1016/j.artmed.2021.102096. Epub 2021 May 7. Artif Intell Med. 2021. PMID: 34127235 Free PMC article.
-
An integrated clustering and BERT framework for improved topic modeling.Int J Inf Technol. 2023;15(4):2187-2195. doi: 10.1007/s41870-023-01268-w. Epub 2023 May 6. Int J Inf Technol. 2023. PMID: 37256029 Free PMC article.
-
The Voice of Chinese Health Consumers: A Text Mining Approach to Web-Based Physician Reviews.J Med Internet Res. 2016 May 10;18(5):e108. doi: 10.2196/jmir.4430. J Med Internet Res. 2016. PMID: 27165558 Free PMC article. Review.
-
Investigating the Efficient Use of Word Embedding with Neural-Topic Models for Interpretable Topics from Short Texts.Sensors (Basel). 2022 Jan 23;22(3):852. doi: 10.3390/s22030852. Sensors (Basel). 2022. PMID: 35161598 Free PMC article.
-
Machine Learning-Based Classification of 38 Years of Spine-Related Literature Into 100 Research Topics.Spine (Phila Pa 1976). 2017 Jun 1;42(11):863-870. doi: 10.1097/BRS.0000000000002079. Spine (Phila Pa 1976). 2017. PMID: 28125523 Review.
Cited by
-
Topic modeling and social network analysis approach to explore diabetes discourse on Twitter in India.Front Artif Intell. 2024 Feb 12;7:1329185. doi: 10.3389/frai.2024.1329185. eCollection 2024. Front Artif Intell. 2024. PMID: 38410423 Free PMC article.
-
Exploring the valued outcomes of school-based speech-language therapy services: a sequential iterative design.Front Rehabil Sci. 2024 Jan 19;5:1290800. doi: 10.3389/fresc.2024.1290800. eCollection 2024. Front Rehabil Sci. 2024. PMID: 38313699 Free PMC article.
-
Depression, anxiety, and burnout in academia: topic modeling of PubMed abstracts.Front Res Metr Anal. 2023 Nov 27;8:1271385. doi: 10.3389/frma.2023.1271385. eCollection 2023. Front Res Metr Anal. 2023. PMID: 38090103 Free PMC article.
-
Mapping the sociodemographic distribution and self-reported justifications for non-compliance with COVID-19 guidelines in the United Kingdom.Front Psychol. 2023 Jul 19;14:1183789. doi: 10.3389/fpsyg.2023.1183789. eCollection 2023. Front Psychol. 2023. PMID: 37539003 Free PMC article.
-
Spatial-temporal evolution pattern and optimization path of family education policy: An LDA thematic model approach.Heliyon. 2023 Jun 21;9(7):e17460. doi: 10.1016/j.heliyon.2023.e17460. eCollection 2023 Jul. Heliyon. 2023. PMID: 37415949 Free PMC article.
References
-
- Ahmed Taloba I., Eisa D. A., Safaa Ismail S. I. (2018). A comparative study on using principle component analysis with different text classifiers. Int. J. Comp. Appl. 180, 1–6. 10.5120/ijca2018916800 - DOI
-
- Albalawi R., Yeap T. H. (2019). “ChatWithRec: Toward a real-time conversational recommender system,” in ISERD 174th International Conference. The International Conference on Computer Science, Machine Learning and Big Data (ICCSMLBD) (New York, NY: ), 67–71. Available online at: http://www.worldresearchlibrary.org/up_proc/pdf/3216-157319215067-71.pdf
-
- Alghamdi R., Alfalqi K. (2015). A survey of topic modeling in text mining. Int. J. Adv. Comp. Sci. Appl. 6, 147–153. 10.14569/IJACSA.2015.060121 - DOI
-
- Anantharaman A., Jadiya A., Siri C. T. S., Bharath Nvs A., Mohan B. (2019). “Performance evaluation of topic modeling algorithms for text classification,” in 2019 3rd International Conference on Trends in Electronics and Informatics (ICOEI) (Tirunelveli: ).
LinkOut - more resources
Full Text Sources
Research Materials
