Object Goal Navigation using Goal-Oriented Semantic Exploration

Open AccessPosted Content

Object Goal Navigation using Goal-Oriented Semantic Exploration

Devendra Singh Chaplot, +3 more

- 01 Jul 2020 -

arXiv: Computer Vision and Pattern Recog...

Chats0

TLDR

A modular system called, `Goal-Oriented Semantic Exploration' which builds an episodic semantic map and uses it to explore the environment efficiently based on the goal object category and outperforms a wide range of baselines including end-to-end learning-based methods as well as modular map- based methods.

Abstract:

This work studies the problem of object goal navigation which involves navigating to an instance of the given object category in unseen environments. End-to-end learning-based navigation methods struggle at this task as they are ineffective at exploration and long-term planning. We propose a modular system called, `Goal-Oriented Semantic Exploration' which builds an episodic semantic map and uses it to explore the environment efficiently based on the goal object category. Empirical results in visually realistic simulation environments show that the proposed model outperforms a wide range of baselines including end-to-end learning-based methods as well as modular map-based methods and led to the winning entry of the CVPR-2020 Habitat ObjectNav Challenge. Ablation analysis indicates that the proposed model learns semantic priors of the relative arrangement of objects in a scene, and uses them to explore efficiently. Domain-agnostic module design allow us to transfer our model to a mobile robot platform and achieve similar performance for object goal navigation in the real-world.

Citations

PDF

Open Access

More filters

Proceedings ArticleDOI

PONI: Potential Functions for ObjectGoal Navigation with Interaction-free Learning

Santhosh K. Ramakrishnan, +4 more

TL;DR: A network that predicts two complementary potential functions conditioned on a semantic map and uses them to decide where to look for an unseen object is proposed, a modular approach that disentangles the skills of 'where to look?' for an object and 'how to navigate to $(x,\ y)$?’.

...read moreread less

Journal ArticleDOI

CLIP on Wheels: Zero-Shot Object Navigation as Object Localization and Exploration

Samir Yitzhak Gadre, +4 more

arXiv.org

TL;DR: This paper translates the success of zero-shot vision models to the popular embodied AI task of object navigation, and finds that a straightforward CoW, with CLIP-based object localization plus classical exploration, and no additional training, often outperforms learnable approaches in terms of success, efficiency, and robustness to dataset distribution shift.

...read moreread less

Proceedings ArticleDOI

Semantic Exploration from Language Abstractions and Pretrained Representations

Allison C. Tam, +8 more

TL;DR: This work evaluates vision-language representations, pretrained on natural image captioning datasets, and shows that these pretrained representations drive meaningful, task-relevant exploration and improve performance on 3D simulated environments.

...read moreread less

Proceedings ArticleDOI

Open-vocabulary Queryable Scene Representations for Real World Planning

Boyuan Chen, +7 more

TL;DR: NLMap is developed, an open-vocabulary and queryable scene representation that allows robots to operate without a fixed list of objects nor executable options, enabling real robot operation unachievable by previous methods.

...read moreread less

Proceedings ArticleDOI

Habitat-Web: Learning Embodied Object-Search Strategies from Human Demonstrations at Scale

Ram Ramrakhya, +3 more

TL;DR: A large-scale study of imitating human demonstrations on tasks that require a virtual robot to search for objects in new environments - ObjectGoal Navigation and Pick&place - finds the IL-trained agent learns efficient object-search behavior from humans.

...read moreread less

Collapse

References

PDF

Open Access

More filters

Proceedings ArticleDOI

Deep Residual Learning for Image Recognition

Kaiming He, +3 more

TL;DR: In this article, the authors proposed a residual learning framework to ease the training of networks that are substantially deeper than those used previously, which won the 1st place on the ILSVRC 2015 classification task.

...read moreread less

Book ChapterDOI

Microsoft COCO: Common Objects in Context

Tsung-Yi Lin, +7 more

TL;DR: A new dataset with the goal of advancing the state-of-the-art in object recognition by placing the question of object recognition in the context of the broader question of scene understanding by gathering images of complex everyday scenes containing common objects in their natural context.

...read moreread less

Proceedings ArticleDOI

Feature Pyramid Networks for Object Detection

Tsung-Yi Lin, +5 more

TL;DR: This paper exploits the inherent multi-scale, pyramidal hierarchy of deep convolutional networks to construct feature pyramids with marginal extra cost and achieves state-of-the-art single-model results on the COCO detection benchmark without bells and whistles.

...read moreread less

Proceedings ArticleDOI

Mask R-CNN

Kaiming He, +3 more

TL;DR: This work presents a conceptually simple, flexible, and general framework for object instance segmentation, which extends Faster R-CNN by adding a branch for predicting an object mask in parallel with the existing branch for bounding box recognition.

...read moreread less

Automatic differentiation in PyTorch

Adam Paszke, +9 more

TL;DR: An automatic differentiation module of PyTorch is described — a library designed to enable rapid research on machine learning models that focuses on differentiation of purely imperative programs, with a focus on extensibility and low overhead.

...read moreread less