DTDNet: Dynamic Target Driven Network for pedestrian trajectory prediction

Front Neurosci. 2024 Apr 30:18:1346374. doi: 10.3389/fnins.2024.1346374. eCollection 2024.

Abstract

Predicting the trajectories of pedestrians is an important and difficult task for many applications, such as robot navigation and autonomous driving. Most of the existing methods believe that an accurate prediction of the pedestrian intention can improve the prediction quality. These works tend to predict a fixed destination coordinate as the agent intention and predict the future trajectory accordingly. However, in the process of moving, the intention of a pedestrian could be a definite location or a general direction and area, and may change dynamically with the changes of surrounding. Thus, regarding the agent intention as a fixed 2-d coordinate is insufficient to improve the future trajectory prediction. To address this problem, we propose Dynamic Target Driven Network for pedestrian trajectory prediction (DTDNet), which employs a multi-precision pedestrian intention analysis module to capture this dynamic. To ensure that this extracted feature contains comprehensive intention information, we design three sub-tasks: predicting coarse-precision endpoint coordinate, predicting fine-precision endpoint coordinate and scoring scene sub-regions. In addition, we propose a original multi-precision trajectory data extraction method to achieve multi-resolution representation of future intention and make it easier to extract local scene information. We compare our model with previous methods on two publicly available datasets (ETH-UCY and Stanford Drone Dataset). The experimental results show that our DTDNet achieves better trajectory prediction performance, and conducts better pedestrian intention feature representation.

Keywords: multi-precision motion prediction; multi-task neural network; multimodal trajectory prediction; pedestrian intention prediction; trajectory endpoint prediction.

Grants and funding

The author(s) declare financial support was received for the research, authorship, and/or publication of this article. This work was supported in part by the Major Program of National Natural Science Foundation of China under Grant 91938301, in part by the National Key Research and Development Program of China under Grant 2020YFB1710400, in part by the Youth Program of National Natural Science Foundation of China under Grant 62002345, and in part by the Innovation Program of Institute of Computing Technology Chinese Academy of Sciences under Grant E261070.