Multiple Traffic Target Tracking with Spatial-Temporal Affinity Network

Comput Intell Neurosci. 2022 May 23:2022:9693767. doi: 10.1155/2022/9693767. eCollection 2022.

Abstract

Traffic target tracking is a core task in intelligent transportation system because it is useful for scene understanding and vehicle autonomous driving. Most state-of-the-art (SOTA) multiple object tracking (MOT) methods adopt a two-step procedure: object detection followed by data association. The object detection has made great progress with the development of deep learning. However, the data association still heavily depends on hand crafted constraints, such as appearance, shape, and motion, which need to be elaborately trained for a special object. In this study, a spatial-temporal encoder-decoder affinity network is proposed for multiple traffic targets tracking, aiming to utilize the power of deep learning to learn a robust spatial-temporal affinity feature of the detections and tracklets for data association. The proposed spatial-temporal affinity network contains a two-stage transformer encoder module to encode the features of the detections and the tracked targets at the image level and the tracklet level, aiming to capture the spatial correlation and temporal history information. Then, a spatial transformer decoder module is designed to compute the association affinity, where the results from the two-stage transformer encoder module are fed back to fully capture and encode the spatial and temporal information from the detections and the tracklets of the tracked targets. Thus, efficient affinity computation can be applied to perform data association in online tracking. To validate the effectiveness of the proposed method, three popular multiple traffic target tracking datasets, KITTI, UA-DETRAC, and VisDrone, are used for evaluation. On the KITTI dataset, the proposed method is compared with 15 SOTA methods and achieves 86.9% multiple object tracking accuracy (MOTA) and 85.71% multiple object tracking precision (MOTP). On the UA-DETRAC dataset, 12 SOTA methods are used to compare with the proposed method, and the proposed method achieves 20.82% MOTA and 35.65% MOTP, respectively. On the VisDrone dataset, the proposed method is compared with 10 SOTA trackers and achieves 40.5% MOTA and 74.1% MOTP, respectively. All those experimental results show that the proposed method is competitive to the state-of-the-art methods by obtaining superior tracking performance.

MeSH terms

  • Data Collection
  • Motion*