Spatial Memory for Context Reasoning in Object Detection
Xinlei Chen,Abhinav Gupta +1 more
- pp 4106-4116
Reads0
Chats0
TLDR
Spatial Memory Network (SMN) as mentioned in this paper assembles object instances back into a pseudo-image representation that is easy to be fed into another ConvNet for object-object context reasoning.Abstract:
Modeling instance-level context and object-object relationships is extremely challenging. It requires reasoning about bounding boxes of different classes, locations etc. Above all, instance-level spatial reasoning inherently requires modeling conditional distributions on previous detections. Unfortunately, our current object detection systems do not have any memory to remember what to condition on! The state-of-the-art object detectors still detect all object in parallel followed by non-maximal suppression (NMS). While memory has been used for tasks such as captioning, they mostly use image-level memory cells without capturing the spatial layout. On the other hand, modeling object-object relationships requires spatial reasoning – not only do we need a memory to store the spatial layout, but also a effective reasoning module to extract spatial patterns. This paper presents a conceptually simple yet powerful solution – Spatial Memory Network (SMN), to model the instance-level context efficiently and effectively. Our spatial memory essentially assembles object instances back into a pseudo “image” representation that is easy to be fed into another ConvNet for object-object context reasoning. This leads to a new sequential reasoning architecture where image and memory are processed in parallel to obtain detections which update the memory again. We show our SMN direction is promising as it provides 2.2% improvement over baseline Faster RCNN on the COCO dataset with VGG161.read more
Citations
More filters
Journal ArticleDOI
Object Detection Using Deep Learning, CNNs and Vision Transformers: A Review
TL;DR: In this paper , a literature review on various state-of-the-art object detection algorithms and the underlying concepts behind these methods is presented, which classify them into three main groups: anchor-based, anchor-free, and transformer-based detectors.
Journal ArticleDOI
Object detection based on knowledge graph network
Book ChapterDOI
Boosting the Performance of Object Detection CNNs with Context-Based Anomaly Detection
TL;DR: In this article, an autoencoder network is used to detect anomalous objects in images, i.e. objects that do not fit with the rest of the current observations in the scene.
Book ChapterDOI
Video Activity Recognition Based on Objects Detection Using Recurrent Neural Networks
TL;DR: In this paper, an end-to-end multitask model that jointly learns object-action relationships was proposed to detect visual relationships between objects to recognize activities in videos of the MSR Daily Activity Dataset.
Posted Content
Universal-RCNN: Universal Object Detector via Transferable Graph R-CNN
TL;DR: Wang et al. as discussed by the authors proposed a universal object detector called Universal-RCNN that incorporates graph transfer learning for propagating relevant semantic information across multiple datasets to reach semantic coherency.
References
More filters
Proceedings ArticleDOI
Deep Residual Learning for Image Recognition
TL;DR: In this article, the authors proposed a residual learning framework to ease the training of networks that are substantially deeper than those used previously, which won the 1st place on the ILSVRC 2015 classification task.
Journal ArticleDOI
Long short-term memory
TL;DR: A novel, efficient, gradient based method called long short-term memory (LSTM) is introduced, which can learn to bridge minimal time lags in excess of 1000 discrete-time steps by enforcing constant error flow through constant error carousels within special units.
Proceedings Article
Very Deep Convolutional Networks for Large-Scale Image Recognition
Karen Simonyan,Andrew Zisserman +1 more
TL;DR: This work investigates the effect of the convolutional network depth on its accuracy in the large-scale image recognition setting using an architecture with very small convolution filters, which shows that a significant improvement on the prior-art configurations can be achieved by pushing the depth to 16-19 weight layers.
Journal ArticleDOI
ImageNet Large Scale Visual Recognition Challenge
Olga Russakovsky,Jia Deng,Hao Su,Jonathan Krause,Sanjeev Satheesh,Sean Ma,Zhiheng Huang,Andrej Karpathy,Aditya Khosla,Michael S. Bernstein,Alexander C. Berg,Li Fei-Fei +11 more
TL;DR: The ImageNet Large Scale Visual Recognition Challenge (ILSVRC) as mentioned in this paper is a benchmark in object category classification and detection on hundreds of object categories and millions of images, which has been run annually from 2010 to present, attracting participation from more than fifty institutions.
Book ChapterDOI
Microsoft COCO: Common Objects in Context
Tsung-Yi Lin,Michael Maire,Serge Belongie,James Hays,Pietro Perona,Deva Ramanan,Piotr Dollár,C. Lawrence Zitnick +7 more
TL;DR: A new dataset with the goal of advancing the state-of-the-art in object recognition by placing the question of object recognition in the context of the broader question of scene understanding by gathering images of complex everyday scenes containing common objects in their natural context.