scispace - formally typeset
Open AccessProceedings ArticleDOI

CSRNet: Dilated Convolutional Neural Networks for Understanding the Highly Congested Scenes

TLDR
CSRNet as discussed by the authors is composed of two major components: a convolutional neural network (CNN) as the front-end for 2D feature extraction and a dilated CNN for the back-end, which uses dilated kernels to deliver larger reception fields and to replace pooling operations.
Abstract
We propose a network for Congested Scene Recognition called CSRNet to provide a data-driven and deep learning method that can understand highly congested scenes and perform accurate count estimation as well as present high-quality density maps. The proposed CSRNet is composed of two major components: a convolutional neural network (CNN) as the front-end for 2D feature extraction and a dilated CNN for the back-end, which uses dilated kernels to deliver larger reception fields and to replace pooling operations. CSRNet is an easy-trained model because of its pure convolutional structure. We demonstrate CSRNet on four datasets (ShanghaiTech dataset, the UCF_CC_50 dataset, the WorldEXPO'10 dataset, and the UCSD dataset) and we deliver the state-of-the-art performance. In the ShanghaiTech Part_B dataset, CSRNet achieves 47.3% lower Mean Absolute Error (MAE) than the previous state-of-the-art method. We extend the targeted applications for counting other objects, such as the vehicle in TRANCOS dataset. Results show that CSRNet significantly improves the output quality with 15.4% lower MAE than the previous state-of-the-art approach.

read more

Content maybe subject to copyright    Report

Citations
More filters
Book ChapterDOI

Scale Aggregation Network for Accurate and Efficient Crowd Counting

TL;DR: A novel training loss, combining of Euclidean loss and local pattern consistency loss is proposed, which improves the performance of the model in the authors' experiments and achieves superior performance to state-of-the-art methods while with much less parameters.
Proceedings ArticleDOI

Context-Aware Crowd Counting

TL;DR: In this article, an end-to-end trainable deep architecture that combines features obtained using multiple receptive field sizes and learns the importance of each such feature at each image location is proposed.
Proceedings ArticleDOI

Learning From Synthetic Data for Crowd Counting in the Wild

TL;DR: A data collector and labeler is developed which can generate the synthetic crowd scenes and simultaneously annotate them without any manpower, and a crowd counting method via domain adaptation is proposed, which can free humans from heavy data annotations.
Posted Content

Learning from Synthetic Data for Crowd Counting in the Wild

TL;DR: Wang et al. as discussed by the authors developed a data collector and labeler to generate the synthetic crowd scenes and simultaneously annotate them without any manpower, which can boost the performance of crowd counting in the wild.
Proceedings ArticleDOI

ADCrowdNet: An Attention-Injective Deformable Convolutional Network for Crowd Understanding

TL;DR: An attention-injective deformable convolutional network for crowd understanding that can address the accuracy degradation problem of highly congested noisy scenes and achieves the capability of being more effective to capture the crowd features and more resistant to various noises.
References
More filters
Proceedings Article

ImageNet Classification with Deep Convolutional Neural Networks

TL;DR: The state-of-the-art performance of CNNs was achieved by Deep Convolutional Neural Networks (DCNNs) as discussed by the authors, which consists of five convolutional layers, some of which are followed by max-pooling layers, and three fully-connected layers with a final 1000-way softmax.
Proceedings Article

Very Deep Convolutional Networks for Large-Scale Image Recognition

TL;DR: This work investigates the effect of the convolutional network depth on its accuracy in the large-scale image recognition setting using an architecture with very small convolution filters, which shows that a significant improvement on the prior-art configurations can be achieved by pushing the depth to 16-19 weight layers.
Proceedings Article

Very Deep Convolutional Networks for Large-Scale Image Recognition

TL;DR: In this paper, the authors investigated the effect of the convolutional network depth on its accuracy in the large-scale image recognition setting and showed that a significant improvement on the prior-art configurations can be achieved by pushing the depth to 16-19 layers.
Journal ArticleDOI

Image quality assessment: from error visibility to structural similarity

TL;DR: In this article, a structural similarity index is proposed for image quality assessment based on the degradation of structural information, which can be applied to both subjective ratings and objective methods on a database of images compressed with JPEG and JPEG2000.
Journal Article

Dropout: a simple way to prevent neural networks from overfitting

TL;DR: It is shown that dropout improves the performance of neural networks on supervised learning tasks in vision, speech recognition, document classification and computational biology, obtaining state-of-the-art results on many benchmark data sets.