scispace - formally typeset
Open AccessProceedings ArticleDOI

Spatial Memory for Context Reasoning in Object Detection

Reads0
Chats0
TLDR
Spatial Memory Network (SMN) as mentioned in this paper assembles object instances back into a pseudo-image representation that is easy to be fed into another ConvNet for object-object context reasoning.
Abstract
Modeling instance-level context and object-object relationships is extremely challenging. It requires reasoning about bounding boxes of different classes, locations etc. Above all, instance-level spatial reasoning inherently requires modeling conditional distributions on previous detections. Unfortunately, our current object detection systems do not have any memory to remember what to condition on! The state-of-the-art object detectors still detect all object in parallel followed by non-maximal suppression (NMS). While memory has been used for tasks such as captioning, they mostly use image-level memory cells without capturing the spatial layout. On the other hand, modeling object-object relationships requires spatial reasoning – not only do we need a memory to store the spatial layout, but also a effective reasoning module to extract spatial patterns. This paper presents a conceptually simple yet powerful solution – Spatial Memory Network (SMN), to model the instance-level context efficiently and effectively. Our spatial memory essentially assembles object instances back into a pseudo “image” representation that is easy to be fed into another ConvNet for object-object context reasoning. This leads to a new sequential reasoning architecture where image and memory are processed in parallel to obtain detections which update the memory again. We show our SMN direction is promising as it provides 2.2% improvement over baseline Faster RCNN on the COCO dataset with VGG161.

read more

Citations
More filters
Journal ArticleDOI

Deep Learning for Generic Object Detection: A Survey

TL;DR: A comprehensive survey of the recent achievements in this field brought about by deep learning techniques, covering many aspects of generic object detection: detection frameworks, object feature representation, object proposal generation, context modeling, training strategies, and evaluation metrics.
Proceedings ArticleDOI

Relation Networks for Object Detection

TL;DR: In this article, the authors propose an object relation module to model relations between objects, which is shown effective on improving object recognition and duplicate removal steps in the modern object detection pipeline.
Journal ArticleDOI

Recent Advances in Deep Learning for Object Detection

TL;DR: A comprehensive survey of recent advances in visual object detection with deep learning can be found in this article, where the authors systematically analyze the existing object detection frameworks and organize the survey into three major parts: detection components, learning strategies, and applications and benchmarks.
Posted Content

Relation Networks for Object Detection

TL;DR: An object relation module is proposed that processes a set of objects simultaneously through interaction between their appearance feature and geometry, thus allowing modeling of their relations, which gives rise to the first fully end-to-end object detector.
Journal ArticleDOI

Recent advances in small object detection based on deep learning: A review

TL;DR: This work comprehensively review the existing small object detection methods based on deep learning from five aspects, including multi-scale feature learning, data augmentation, training strategy, context-based detection and GAN- based detection.
References
More filters
Proceedings ArticleDOI

Where to Look: Focus Regions for Visual Question Answering

TL;DR: A method that learns to answer visual questions by selecting image regions relevant to the text-based query that exhibits significant improvements in answering questions such as "what color", where it is necessary to evaluate a specific location, and "what room," where it selectively identifies informative image regions.
Proceedings ArticleDOI

End-to-End People Detection in Crowded Scenes

TL;DR: This work proposes a model that is based on decoding an image into a set of people detections, which takes an image as input and directly outputs aset of distinct detection hypotheses.
Proceedings ArticleDOI

Active Object Localization with Deep Reinforcement Learning

TL;DR: In this paper, an active detection model is proposed for localizing objects in scenes, which allows an agent to focus attention on candidate regions for identifying the correct location of a target object.
Book ChapterDOI

Grounding of Textual Phrases in Images by Reconstruction

TL;DR: A novel approach which learns grounding by reconstructing a given phrase using an attention mechanism, which can be either latent or optimized directly, and demonstrates the effectiveness on the Flickr 30k Entities and ReferItGame datasets.
Proceedings Article

Using the Forest to See the Trees: A Graphical Model Relating Features, Objects, and Scenes

TL;DR: This work presents a conditional random field for jointly solving the tasks of object detection and scene classification, and proposes to use the scene context as an extra source of (global) information, to help resolve local ambiguities.
Related Papers (5)