scispace - formally typeset
Open AccessProceedings ArticleDOI

Counting Everyday Objects in Everyday Scenes

TLDR
In this article, a divide-and-conquerior model for counting the number of instances of object classes in natural, everyday images is proposed, inspired by the phenomenon of subitizing and #x2013.
Abstract
We are interested in counting the number of instances of object classes in natural, everyday images. Previous counting approaches tackle the problem in restricted domains such as counting pedestrians in surveillance videos. Counts can also be estimated from outputs of other vision tasks like object detection. In this work, we build dedicated models for counting designed to tackle the large variance in counts, appearances, and scales of objects found in natural scenes. Our approach is inspired by the phenomenon of subitizing – the ability of humans to make quick assessments of counts given a perceptual signal, for small count values. Given a natural scene, we employ a divide and conquer strategy while incorporating context across the scene to adapt the subitizing idea to counting. Our approach offers consistent improvements over numerous baseline approaches for counting on the PASCAL VOC 2007 and COCO datasets. Subsequently, we study how counting can be used to improve object detection. We then show a proof of concept application of our counting methods to the task of Visual Question Answering, by studying the how many? questions in the VQA and COCO-QA datasets.

read more

Content maybe subject to copyright    Report

Citations
More filters
Posted Content

Automatically identifying, counting, and describing wild animals in camera-trap images with deep learning

TL;DR: The ability to automatically, accurately, and inexpensively collect such data, which could help catalyze the transformation of many fields of ecology, wildlife biology, zoology, conservation biology, and animal behavior into “big data” sciences is investigated.
Proceedings Article

Compositional Attention Networks for Machine Reasoning

TL;DR: The MAC network is presented, a novel fully differentiable neural network architecture, designed to facilitate explicit and expressive reasoning that is computationally-efficient and data-efficient, in particular requiring 5x less data than existing models to achieve strong results.
Proceedings ArticleDOI

Representation Learning by Learning to Count

TL;DR: This paper uses two image transformations in the context of counting: scaling and tiling to train a neural network with a contrastive loss that produces representations that perform on par or exceed the state of the art in transfer learning benchmarks.
Proceedings ArticleDOI

Context-Aware Crowd Counting

TL;DR: In this article, an end-to-end trainable deep architecture that combines features obtained using multiple receptive field sizes and learns the importance of each such feature at each image location is proposed.
Proceedings ArticleDOI

Bayesian Loss for Crowd Count Estimation With Point Supervision

TL;DR: This work proposes Bayesian loss, a novel loss function which constructs a density contribution probability model from the point annotations, and outperforms previous best approaches by a large margin on the latest and largest UCF-QNRF dataset.
References
More filters
Proceedings Article

Adam: A Method for Stochastic Optimization

TL;DR: This work introduces Adam, an algorithm for first-order gradient-based optimization of stochastic objective functions, based on adaptive estimates of lower-order moments, and provides a regret bound on the convergence rate that is comparable to the best known results under the online convex optimization framework.
Proceedings Article

ImageNet Classification with Deep Convolutional Neural Networks

TL;DR: The state-of-the-art performance of CNNs was achieved by Deep Convolutional Neural Networks (DCNNs) as discussed by the authors, which consists of five convolutional layers, some of which are followed by max-pooling layers, and three fully-connected layers with a final 1000-way softmax.
Proceedings Article

Very Deep Convolutional Networks for Large-Scale Image Recognition

TL;DR: This work investigates the effect of the convolutional network depth on its accuracy in the large-scale image recognition setting using an architecture with very small convolution filters, which shows that a significant improvement on the prior-art configurations can be achieved by pushing the depth to 16-19 weight layers.
Proceedings Article

Very Deep Convolutional Networks for Large-Scale Image Recognition

TL;DR: In this paper, the authors investigated the effect of the convolutional network depth on its accuracy in the large-scale image recognition setting and showed that a significant improvement on the prior-art configurations can be achieved by pushing the depth to 16-19 layers.
Proceedings Article

Batch Normalization: Accelerating Deep Network Training by Reducing Internal Covariate Shift

TL;DR: Applied to a state-of-the-art image classification model, Batch Normalization achieves the same accuracy with 14 times fewer training steps, and beats the original model by a significant margin.
Related Papers (5)