3D attention-driven depth acquisition for object identification

doi:10.1145/2980179.2980224

Journal ArticleDOI

3D attention-driven depth acquisition for object identification

- Vol. 35, Iss: 6, pp 238

TLDR

A 3D Attention Model that selects the best views to scan from, as well as the most informative regions in each view to focus on, to achieve efficient object recognition is developed, which leads to focus-driven features which are quite robust against object occlusion.

Abstract:

We address the problem of autonomously exploring unknown objects in a scene by consecutive depth acquisitions. The goal is to reconstruct the scene while online identifying the objects from among a large collection of 3D shapes. Fine-grained shape identification demands a meticulous series of observations attending to varying views and parts of the object of interest. Inspired by the recent success of attention-based models for 2D recognition, we develop a 3D Attention Model that selects the best views to scan from, as well as the most informative regions in each view to focus on, to achieve efficient object recognition. The region-level attention leads to focus-driven features which are quite robust against object occlusion. The attention model, trained with the 3D shape collection, encodes the temporal dependencies among consecutive views with deep recurrent networks. This facilitates order-aware view planning accounting for robot movement cost. In achieving instance identification, the shape collection is organized into a hierarchy, associated with pre-trained hierarchical classifiers. The effectiveness of our method is demonstrated on an autonomous robot (PR) that explores a scene and identifies the objects to construct a 3D scene model.

Citations

PDF

Open Access

More filters

Journal ArticleDOI

3D2SeqViews: Aggregating Sequential Views for 3D Global Feature Learning by CNN With Hierarchical Attention Aggregation

Zhizhong Han, +7 more

- 12 Mar 2019 -

IEEE Transactions on Image Processing

TL;DR: 3D to Sequential Views (3D2SeqViews) is proposed to more effectively aggregate the sequential views using convolutional neural networks with a novel hierarchical attention aggregation to resolve the discriminability of learned features.

...read moreread less

Journal ArticleDOI

Indoor Scene Understanding in 2.5/3D for Autonomous Agents: A Survey

Muhammad Muzammal Naseer, +2 more

- 10 Jan 2019 -

IEEE Access

TL;DR: This survey paper provides a comprehensive background to the developed techniques according to a taxonomy based on the scene understanding tasks, and summarizes the performance metrics used for evaluation in different tasks and a quantitative comparison among the recent state-of-the-art techniques.

...read moreread less

Journal ArticleDOI

A multi-view recurrent neural network for 3D mesh segmentation

Truc Le, +2 more

- 01 Aug 2017 -

Computers & Graphics

TL;DR: A multi-view recurrent neural network (MV-RNN) approach for 3D mesh segmentation that combines the convolutional neural networks and a two-layer long short term memory to yield coherent segmentation of 3D shapes is introduced.

...read moreread less

Journal ArticleDOI

Language-driven synthesis of 3D scenes from scene databases

Rui Ma, +9 more

- 04 Dec 2018 -

ACM Transactions on Graphics

TL;DR: A novel framework for using natural language to generate and edit 3D indoor scenes, harnessing scene semantics and text-scene grounding knowledge learned from large annotated 3D scene databases is introduced.

...read moreread less

Journal ArticleDOI

View planning in robot active vision: A survey of systems, algorithms, and applications

Rui Zeng, +3 more

- 30 Nov 2020 -

Computational Visual Media

TL;DR: Some basic concepts of active robot vision are summarized, representative work on systems, algorithms and applications from four perspectives are reviewed from three perspectives: object reconstruction, scene reconstruction, object recognition, and pose estimation.

...read moreread less

Collapse

References

PDF

Open Access

More filters

Journal ArticleDOI

Long short-term memory

Sepp Hochreiter, +1 more

- 01 Nov 1997 -

Neural Computation

TL;DR: A novel, efficient, gradient based method called long short-term memory (LSTM) is introduced, which can learn to bridge minimal time lags in excess of 1000 discrete-time steps by enforcing constant error flow through constant error carousels within special units.

...read moreread less

Journal ArticleDOI

Gradient-based learning applied to document recognition

Yann LeCun, +6 more

TL;DR: In this article, a graph transformer network (GTN) is proposed for handwritten character recognition, which can be used to synthesize a complex decision surface that can classify high-dimensional patterns, such as handwritten characters.

...read moreread less

Journal ArticleDOI

ImageNet classification with deep convolutional neural networks

Alex Krizhevsky, +2 more

- 24 May 2017 -

Communications of The ACM

TL;DR: A large, deep convolutional neural network was trained to classify the 1.2 million high-resolution images in the ImageNet LSVRC-2010 contest into the 1000 different classes and employed a recently developed regularization method called "dropout" that proved to be very effective.

...read moreread less

Journal ArticleDOI

Control of goal-directed and stimulus-driven attention in the brain

Maurizio Corbetta, +1 more

- 01 Mar 2002 -

Nature Reviews Neuroscience

TL;DR: Evidence for partially segregated networks of brain areas that carry out different attentional functions is reviewed, finding that one system is involved in preparing and applying goal-directed selection for stimuli and responses, and the other is specialized for the detection of behaviourally relevant stimuli.

...read moreread less

Journal ArticleDOI

Simple Statistical Gradient-Following Algorithms for Connectionist Reinforcement Learning

Ronald J. Williams

- 01 May 1992 -

Machine Learning

TL;DR: This article presents a general class of associative reinforcement learning algorithms for connectionist networks containing stochastic units that are shown to make weight adjustments in a direction that lies along the gradient of expected reinforcement in both immediate-reinforcement tasks and certain limited forms of delayed-reInforcement tasks, and they do this without explicitly computing gradient estimates.

...read moreread less

Collapse

3D attention-driven depth acquisition for object identification

Citations

3D2SeqViews: Aggregating Sequential Views for 3D Global Feature Learning by CNN With Hierarchical Attention Aggregation

Indoor Scene Understanding in 2.5/3D for Autonomous Agents: A Survey

A multi-view recurrent neural network for 3D mesh segmentation

Language-driven synthesis of 3D scenes from scene databases

View planning in robot active vision: A survey of systems, algorithms, and applications

References

Long short-term memory

Gradient-based learning applied to document recognition

ImageNet classification with deep convolutional neural networks

Control of goal-directed and stimulus-driven attention in the brain

Simple Statistical Gradient-Following Algorithms for Connectionist Reinforcement Learning

Related Papers (5)

3D ShapeNets: A deep representation for volumetric shapes

Semantic Scene Completion from a Single Depth Image

Multi-view Convolutional Neural Networks for 3D Shape Recognition

Robust reconstruction of indoor scenes

KinectFusion: Real-time dense surface mapping and tracking