Crowd Density Estimation
- Date:06/09/17
- Team: Yi Xu, Zan Shen, Yiming Li
- Goal: To achieve more accurate crowd counting in complex and dense crowd scenes.
Brief Introduction
Crowd counting or density estimation is a challenging task in computer vision due to large scale variations, perspective distortions and serious occlusions, etc. Existing methods generally suffer from inherent algorithmic drawbacks. On one hand, only traditional Euclidean loss is employed to optimize these models, which is known to have certain disadvantages such as sensitivity to outliers and image blur. In particular, although different sizes of convolutional kernels are used to extract multi-scale features, each sub-network path attempts to minimize the regression loss independently (i.e., multi-scale model competition) and to predict the correct density map for patches with all human scales. On the other hand, most existing approaches do not explore the coherence between the estimated density maps from different scales. Namely, the sum up of the crowd counts from local patches (i.e., small scale) does NOT necessarily correspond to the overall count of their region union (i.e., large scale).
To address these issues, we propose a novel crowd counting framework called Adversarial Cross-Scale Consistency Pursuit Networks (ACSCP). On one hand, inspired by the recent success of GANs in image translation, we propose a patch-to-density generation network endowed with an adversarial training loss, to mitigate blurring effect caused by optimization only over traditional Euclidean loss. Further, the proposed multi-scale U-net generation architecture executes a pixel-wise translation from every crowd image pixel to its corresponding density value, which ensures high resolution and high quality density map estimation. On the other hand, a new regularizer is proposed to further enforce cross-scale model calibration and encourage different scale paths to work collaboratively. In particular, our model is made of two complementary density map generators: one takes large scale patch input, and the other takes small scale patch input. We enforces that the sum up of the crowd counts from local patches (i.e., small scale) is coherent with the overall count of their region union (i.e., large scale). The above objectives are integrated via a joint training scheme, so as to help boost density estimation performance by further exploring their collaboration.
Extensive experiments on four benchmarks have well demonstrated the effectiveness of the proposed innovations as well as the superior performance over prior art.