Semi-Supervised Crowd Counting via Multiple Representation Learning

IEEE Trans Image Process. 2023:32:5220-5230. doi: 10.1109/TIP.2023.3313490. Epub 2023 Sep 20.

Abstract

There has been a growing interest in counting crowds through computer vision and machine learning techniques in recent years. Despite that significant progress has been made, most existing methods heavily rely on fully-supervised learning and require a lot of labeled data. To alleviate the reliance, we focus on the semi-supervised learning paradigm. Usually, crowd counting is converted to a density estimation problem. The model is trained to predict a density map and obtains the total count by accumulating densities over all the locations. In particular, we find that there could be multiple density map representations for a given image in a way that they differ in probability distribution forms but reach a consensus on their total counts. Therefore, we propose multiple representation learning to train several models. Each model focuses on a specific density representation and utilizes the count consistency between models to supervise unlabeled data. To bypass the explicit density regression problem, which makes a strong parametric assumption on the underlying density distribution, we propose an implicit density representation method based on the kernel mean embedding. Extensive experiments demonstrate that our approach outperforms state-of-the-art semi-supervised methods significantly.