Dimensions management of traffic big data for short-term traffic prediction on suburban roadways

Sci Rep. 2024 Jan 17;14(1):1484. doi: 10.1038/s41598-024-51988-7.

Abstract

Since intelligent systems were developed to collect traffic data, this data can be collected at high volume, velocity, and variety, resulting in big traffic data. In previous studies, dealing with the large volume of big traffic data has always been discussed. In this study, big traffic data were used to predict traffic state on a section of suburban road from Karaj to Chalous located in the north of Iran. Due to the many and various extracted features, data dimensions management is necessary. This management was accomplished using principal component analysis to reduce the number of features, genetic algorithms to select features influencing traffic states, and cyclic features to change the nature of features. The data set obtained from each method is used as input to the models. The models used include long short-term memory, support vector machine, and random forest. The results show that using cyclic features can increase traffic state prediction's accuracy than the model, including all the initial features (base model). Long short-term memory model with 71 cyclic features offers the highest accuracy, equivalent to 88.09%. Additionally, this model's reduced number of features led to a shorter modelling execution time, from 456 s (base model) to 201 s.