A high-resolution temporal and geospatial content analysis of Twitter posts related to the COVID-19 pandemic

J Comput Soc Sci. 2022;5(1):687-729. doi: 10.1007/s42001-021-00150-8. Epub 2021 Oct 20.

Abstract

The COVID-19 pandemic has deeply impacted all aspects of social, professional, and financial life, with concerns and responses being readily published in online social media worldwide. This study employs probabilistic text mining techniques for a large-scale, high-resolution, temporal, and geospatial content analysis of Twitter related discussions. Analysis considered 20,230,833 English language original COVID-19-related tweets with global origin retrieved between January 25, 2020 and April 30, 2020. Fine grain topic analysis identified 91 meaningful topics. Most of the topics showed a temporal evolution with local maxima, underlining the short-lived character of discussions in Twitter. When compared to real-world events, temporal popularity curves showed a good correlation with and quick response to real-world triggers. Geospatial analysis of topics showed that approximately 30% of original English language tweets were contributed by USA-based users, while overall more than 60% of the English language tweets were contributed by users from countries with an official language other than English. High-resolution temporal and geospatial analysis of Twitter content shows potential for political, economic, and social monitoring on a global and national level.

Keywords: COVID-19; Geospatial analysis; Latent Dirichlet Allocation; Social media analysis; Topic modeling; Twitter.