scispace - formally typeset
Journal ArticleDOI

3D attention-driven depth acquisition for object identification

TLDR
A 3D Attention Model that selects the best views to scan from, as well as the most informative regions in each view to focus on, to achieve efficient object recognition is developed, which leads to focus-driven features which are quite robust against object occlusion.
Abstract
We address the problem of autonomously exploring unknown objects in a scene by consecutive depth acquisitions. The goal is to reconstruct the scene while online identifying the objects from among a large collection of 3D shapes. Fine-grained shape identification demands a meticulous series of observations attending to varying views and parts of the object of interest. Inspired by the recent success of attention-based models for 2D recognition, we develop a 3D Attention Model that selects the best views to scan from, as well as the most informative regions in each view to focus on, to achieve efficient object recognition. The region-level attention leads to focus-driven features which are quite robust against object occlusion. The attention model, trained with the 3D shape collection, encodes the temporal dependencies among consecutive views with deep recurrent networks. This facilitates order-aware view planning accounting for robot movement cost. In achieving instance identification, the shape collection is organized into a hierarchy, associated with pre-trained hierarchical classifiers. The effectiveness of our method is demonstrated on an autonomous robot (PR) that explores a scene and identifies the objects to construct a 3D scene model.

read more

Citations
More filters
Journal ArticleDOI

3D2SeqViews: Aggregating Sequential Views for 3D Global Feature Learning by CNN With Hierarchical Attention Aggregation

TL;DR: 3D to Sequential Views (3D2SeqViews) is proposed to more effectively aggregate the sequential views using convolutional neural networks with a novel hierarchical attention aggregation to resolve the discriminability of learned features.
Journal ArticleDOI

Indoor Scene Understanding in 2.5/3D for Autonomous Agents: A Survey

TL;DR: This survey paper provides a comprehensive background to the developed techniques according to a taxonomy based on the scene understanding tasks, and summarizes the performance metrics used for evaluation in different tasks and a quantitative comparison among the recent state-of-the-art techniques.
Journal ArticleDOI

A multi-view recurrent neural network for 3D mesh segmentation

TL;DR: A multi-view recurrent neural network (MV-RNN) approach for 3D mesh segmentation that combines the convolutional neural networks and a two-layer long short term memory to yield coherent segmentation of 3D shapes is introduced.
Journal ArticleDOI

Language-driven synthesis of 3D scenes from scene databases

TL;DR: A novel framework for using natural language to generate and edit 3D indoor scenes, harnessing scene semantics and text-scene grounding knowledge learned from large annotated 3D scene databases is introduced.
Journal ArticleDOI

View planning in robot active vision: A survey of systems, algorithms, and applications

TL;DR: Some basic concepts of active robot vision are summarized, representative work on systems, algorithms and applications from four perspectives are reviewed from three perspectives: object reconstruction, scene reconstruction, object recognition, and pose estimation.
References
More filters
Journal ArticleDOI

Long short-term memory

TL;DR: A novel, efficient, gradient based method called long short-term memory (LSTM) is introduced, which can learn to bridge minimal time lags in excess of 1000 discrete-time steps by enforcing constant error flow through constant error carousels within special units.
Journal ArticleDOI

Gradient-based learning applied to document recognition

TL;DR: In this article, a graph transformer network (GTN) is proposed for handwritten character recognition, which can be used to synthesize a complex decision surface that can classify high-dimensional patterns, such as handwritten characters.
Journal ArticleDOI

ImageNet classification with deep convolutional neural networks

TL;DR: A large, deep convolutional neural network was trained to classify the 1.2 million high-resolution images in the ImageNet LSVRC-2010 contest into the 1000 different classes and employed a recently developed regularization method called "dropout" that proved to be very effective.
Journal ArticleDOI

Control of goal-directed and stimulus-driven attention in the brain

TL;DR: Evidence for partially segregated networks of brain areas that carry out different attentional functions is reviewed, finding that one system is involved in preparing and applying goal-directed selection for stimuli and responses, and the other is specialized for the detection of behaviourally relevant stimuli.
Journal ArticleDOI

Simple Statistical Gradient-Following Algorithms for Connectionist Reinforcement Learning

TL;DR: This article presents a general class of associative reinforcement learning algorithms for connectionist networks containing stochastic units that are shown to make weight adjustments in a direction that lies along the gradient of expected reinforcement in both immediate-reinforcement tasks and certain limited forms of delayed-reInforcement tasks, and they do this without explicitly computing gradient estimates.
Related Papers (5)