In this paper, we propose a fast and accurate deep network-based object tracking method, which combines feature representation, template tracking and foreground detection into a single framework for robust tracking. The proposed framework consists of a backbone network, which feeds into two parallel networks, TmpNet for template tracking and FgNet for foreground detection. The backbone network is a pre-trained modified VGG network, in which a few parameters need to be fine-tuned for adapting to the tracked object. FgNet is a fully convolutional network to distinguish the foreground from background in a pixel-to-pixel manner. The parameter in TmpNet is the learned channel-wise target template, which initializes in the first frame and performs fast template tracking in the test frames. To enable each component to work closely with each other, we use a multi-task loss to end-to-end train the proposed framework. In online tracking, we combine the score maps from TmpNet and FgNet to find the optimal tracking results. Experimental results on object tracking benchmarks demonstrate that our approach achieves favorable tracking accuracy against the state-of-the-art trackers while running at a real-time speed of 38 fps.
Keywords: convolutional neural network; foreground detection; object tracking; template matching.