Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
. 2021 Apr 24;21(9):2993.
doi: 10.3390/s21092993.

Iktishaf+: A Big Data Tool with Automatic Labeling for Road Traffic Social Sensing and Event Detection Using Distributed Machine Learning

Affiliations
Free PMC article

Iktishaf+: A Big Data Tool with Automatic Labeling for Road Traffic Social Sensing and Event Detection Using Distributed Machine Learning

Ebtesam Alomari et al. Sensors (Basel). .
Free PMC article

Abstract

Digital societies could be characterized by their increasing desire to express themselves and interact with others. This is being realized through digital platforms such as social media that have increasingly become convenient and inexpensive sensors compared to physical sensors in many sectors of smart societies. One such major sector is road transportation, which is the backbone of modern economies and costs globally 1.25 million deaths and 50 million human injuries annually. The cutting-edge on big data-enabled social media analytics for transportation-related studies is limited. This paper brings a range of technologies together to detect road traffic-related events using big data and distributed machine learning. The most specific contribution of this research is an automatic labelling method for machine learning-based traffic-related event detection from Twitter data in the Arabic language. The proposed method has been implemented in a software tool called Iktishaf+ (an Arabic word meaning discovery) that is able to detect traffic events automatically from tweets in the Arabic language using distributed machine learning over Apache Spark. The tool is built using nine components and a range of technologies including Apache Spark, Parquet, and MongoDB. Iktishaf+ uses a light stemmer for the Arabic language developed by us. We also use in this work a location extractor developed by us that allows us to extract and visualize spatio-temporal information about the detected events. The specific data used in this work comprises 33.5 million tweets collected from Saudi Arabia using the Twitter API. Using support vector machines, naïve Bayes, and logistic regression-based classifiers, we are able to detect and validate several real events in Saudi Arabia without prior knowledge, including a fire in Jeddah, rains in Makkah, and an accident in Riyadh. The findings show the effectiveness of Twitter media in detecting important events with no prior knowledge about them.

Keywords: Arabic tweets; automatic labeling; big data; data analytics; distributed machine learning; event detection; road traffic; smart cities; social media; social media analytics.

PubMed Disclaimer

Conflict of interest statement

The authors declare no conflict of interest.

Figures

Figure 1
Figure 1
Iktishaf+: The proposed system architecture.
Figure 2
Figure 2
Spark Application Run on Yarn.
Figure 3
Figure 3
A Tweet Object.
Figure 4
Figure 4
Fire Event on 1 October 2018.
Figure 5
Figure 5
Intensity of Detected Fire Event in Riyadh (1 October 2018).
Figure 6
Figure 6
Fire Event on 29 September 2019.
Figure 7
Figure 7
Intensity of Detected Fire Event in Jeddah (29 September 2019).
Figure 8
Figure 8
Weather Event on 23 November 2018.
Figure 9
Figure 9
Intensity of Detected Weather Event in Makkah (23 November 2018).
Figure 10
Figure 10
Accident Event on 8 October 2018.
Figure 11
Figure 11
Intensity of Detected Accident Event in Riyadh (8 October 2018).
Figure 12
Figure 12
Number of Tweets Using Different Location Extraction Approaches.
Figure 13
Figure 13
The Number of Detected Events in Different Provinces.
Figure 14
Figure 14
Hourly Distribution of Tweets Divided by Provinces (Aggregated).
Figure 15
Figure 15
Numerical Evaluation (Tweets Filtering).
Figure 16
Figure 16
Numerical Evaluation (Events Classification).

Similar articles

Cited by

References

    1. Mehmood R., See S., Katib I., Chlamtac I., editors. EAI/Springer Innovations in Communication and Computing. Springer; Cham, Switzerland: 2020. Smart Infrastructure and Applications: Foundations for Smarter Cities and Societies; p. 692.
    1. Hashem I.A.T., Chang V., Anuar N.B., Adewole K., Yaqoob I., Gani A., Ahmed E., Chiroma H. The role of big data in smart city. Int. J. Inf. Manag. 2016;36:748–758. doi: 10.1016/j.ijinfomgt.2016.05.002. - DOI
    1. Zheng X., Chen W., Wang P., Shen D., Chen S., Wang X., Zhang Q., Yang L. Big Data for Social Transportation. IEEE Trans. Intell. Transp. Syst. 2016;17:620–630. doi: 10.1109/TITS.2015.2480157. - DOI
    1. AlOmari E., Mehmood R., Katib I. 2019 IEEE Smart World, Ubiquitous Intelligence & Computing, Advanced & Trusted Computing, Scalable Computing & Communications, Cloud & Big Data Computing, Internet of People and Smart City Innovation (SmartWorld/SCALCOM/UIC/ATC/CBDCom/IOP/SCI), Leicester, UK, 19–23 August 2019. Institute of Electrical and Electronics Engineers (IEEE); Piscataway, NJ, USA: 2019. Road Traffic Event Detection Using Twitter Data, Machine Learning, and Apache Spark; pp. 1888–1895.
    1. Huang W., Xu S., Yan Y., Zipf A. An exploration of the interaction between urban human activities and daily traffic conditions: A case study of Toronto, Canada. Cities. 2019;84:8–22. doi: 10.1016/j.cities.2018.07.001. - DOI

LinkOut - more resources