CSRNet: Dilated Convolutional Neural Networks for Understanding the Highly Congested Scenes
Yuhong Li,Yuhong Li,Xiaofan Zhang,Deming Chen +3 more
- pp 1091-1100
TLDR
CSRNet as discussed by the authors is composed of two major components: a convolutional neural network (CNN) as the front-end for 2D feature extraction and a dilated CNN for the back-end, which uses dilated kernels to deliver larger reception fields and to replace pooling operations.Abstract:
We propose a network for Congested Scene Recognition called CSRNet to provide a data-driven and deep learning method that can understand highly congested scenes and perform accurate count estimation as well as present high-quality density maps. The proposed CSRNet is composed of two major components: a convolutional neural network (CNN) as the front-end for 2D feature extraction and a dilated CNN for the back-end, which uses dilated kernels to deliver larger reception fields and to replace pooling operations. CSRNet is an easy-trained model because of its pure convolutional structure. We demonstrate CSRNet on four datasets (ShanghaiTech dataset, the UCF_CC_50 dataset, the WorldEXPO'10 dataset, and the UCSD dataset) and we deliver the state-of-the-art performance. In the ShanghaiTech Part_B dataset, CSRNet achieves 47.3% lower Mean Absolute Error (MAE) than the previous state-of-the-art method. We extend the targeted applications for counting other objects, such as the vehicle in TRANCOS dataset. Results show that CSRNet significantly improves the output quality with 15.4% lower MAE than the previous state-of-the-art approach.read more
Citations
More filters
Book ChapterDOI
Scale Aggregation Network for Accurate and Efficient Crowd Counting
TL;DR: A novel training loss, combining of Euclidean loss and local pattern consistency loss is proposed, which improves the performance of the model in the authors' experiments and achieves superior performance to state-of-the-art methods while with much less parameters.
Proceedings ArticleDOI
Context-Aware Crowd Counting
TL;DR: In this article, an end-to-end trainable deep architecture that combines features obtained using multiple receptive field sizes and learns the importance of each such feature at each image location is proposed.
Proceedings ArticleDOI
Learning From Synthetic Data for Crowd Counting in the Wild
TL;DR: A data collector and labeler is developed which can generate the synthetic crowd scenes and simultaneously annotate them without any manpower, and a crowd counting method via domain adaptation is proposed, which can free humans from heavy data annotations.
Posted Content
Learning from Synthetic Data for Crowd Counting in the Wild
TL;DR: Wang et al. as discussed by the authors developed a data collector and labeler to generate the synthetic crowd scenes and simultaneously annotate them without any manpower, which can boost the performance of crowd counting in the wild.
Proceedings ArticleDOI
ADCrowdNet: An Attention-Injective Deformable Convolutional Network for Crowd Understanding
TL;DR: An attention-injective deformable convolutional network for crowd understanding that can address the accuracy degradation problem of highly congested noisy scenes and achieves the capability of being more effective to capture the crowd features and more resistant to various noises.
References
More filters
Proceedings Article
ImageNet Classification with Deep Convolutional Neural Networks
TL;DR: The state-of-the-art performance of CNNs was achieved by Deep Convolutional Neural Networks (DCNNs) as discussed by the authors, which consists of five convolutional layers, some of which are followed by max-pooling layers, and three fully-connected layers with a final 1000-way softmax.
Proceedings Article
Very Deep Convolutional Networks for Large-Scale Image Recognition
Karen Simonyan,Andrew Zisserman +1 more
TL;DR: This work investigates the effect of the convolutional network depth on its accuracy in the large-scale image recognition setting using an architecture with very small convolution filters, which shows that a significant improvement on the prior-art configurations can be achieved by pushing the depth to 16-19 weight layers.
Proceedings Article
Very Deep Convolutional Networks for Large-Scale Image Recognition
Karen Simonyan,Andrew Zisserman +1 more
TL;DR: In this paper, the authors investigated the effect of the convolutional network depth on its accuracy in the large-scale image recognition setting and showed that a significant improvement on the prior-art configurations can be achieved by pushing the depth to 16-19 layers.
Journal ArticleDOI
Image quality assessment: from error visibility to structural similarity
TL;DR: In this article, a structural similarity index is proposed for image quality assessment based on the degradation of structural information, which can be applied to both subjective ratings and objective methods on a database of images compressed with JPEG and JPEG2000.
Journal Article
Dropout: a simple way to prevent neural networks from overfitting
TL;DR: It is shown that dropout improves the performance of neural networks on supervised learning tasks in vision, speech recognition, document classification and computational biology, obtaining state-of-the-art results on many benchmark data sets.