scispace - formally typeset
Open AccessPosted Content

Object Goal Navigation using Goal-Oriented Semantic Exploration

Reads0
Chats0
TLDR
A modular system called, `Goal-Oriented Semantic Exploration' which builds an episodic semantic map and uses it to explore the environment efficiently based on the goal object category and outperforms a wide range of baselines including end-to-end learning-based methods as well as modular map- based methods.
Abstract
This work studies the problem of object goal navigation which involves navigating to an instance of the given object category in unseen environments. End-to-end learning-based navigation methods struggle at this task as they are ineffective at exploration and long-term planning. We propose a modular system called, `Goal-Oriented Semantic Exploration' which builds an episodic semantic map and uses it to explore the environment efficiently based on the goal object category. Empirical results in visually realistic simulation environments show that the proposed model outperforms a wide range of baselines including end-to-end learning-based methods as well as modular map-based methods and led to the winning entry of the CVPR-2020 Habitat ObjectNav Challenge. Ablation analysis indicates that the proposed model learns semantic priors of the relative arrangement of objects in a scene, and uses them to explore efficiently. Domain-agnostic module design allow us to transfer our model to a mobile robot platform and achieve similar performance for object goal navigation in the real-world.

read more

Citations
More filters
Proceedings ArticleDOI

PONI: Potential Functions for ObjectGoal Navigation with Interaction-free Learning

TL;DR: A network that predicts two complementary potential functions conditioned on a semantic map and uses them to decide where to look for an unseen object is proposed, a modular approach that disentangles the skills of 'where to look?' for an object and 'how to navigate to $(x,\ y)$?’.
Journal ArticleDOI

CLIP on Wheels: Zero-Shot Object Navigation as Object Localization and Exploration

TL;DR: This paper translates the success of zero-shot vision models to the popular embodied AI task of object navigation, and finds that a straightforward CoW, with CLIP-based object localization plus classical exploration, and no additional training, often outperforms learnable approaches in terms of success, efficiency, and robustness to dataset distribution shift.
Proceedings ArticleDOI

Semantic Exploration from Language Abstractions and Pretrained Representations

TL;DR: This work evaluates vision-language representations, pretrained on natural image captioning datasets, and shows that these pretrained representations drive meaningful, task-relevant exploration and improve performance on 3D simulated environments.
Proceedings ArticleDOI

Open-vocabulary Queryable Scene Representations for Real World Planning

TL;DR: NLMap is developed, an open-vocabulary and queryable scene representation that allows robots to operate without a fixed list of objects nor executable options, enabling real robot operation unachievable by previous methods.
Proceedings ArticleDOI

Habitat-Web: Learning Embodied Object-Search Strategies from Human Demonstrations at Scale

TL;DR: A large-scale study of imitating human demonstrations on tasks that require a virtual robot to search for objects in new environments - ObjectGoal Navigation and Pick&place - finds the IL-trained agent learns efficient object-search behavior from humans.
References
More filters
Proceedings ArticleDOI

Deep Residual Learning for Image Recognition

TL;DR: In this article, the authors proposed a residual learning framework to ease the training of networks that are substantially deeper than those used previously, which won the 1st place on the ILSVRC 2015 classification task.
Book ChapterDOI

Microsoft COCO: Common Objects in Context

TL;DR: A new dataset with the goal of advancing the state-of-the-art in object recognition by placing the question of object recognition in the context of the broader question of scene understanding by gathering images of complex everyday scenes containing common objects in their natural context.
Proceedings ArticleDOI

Feature Pyramid Networks for Object Detection

TL;DR: This paper exploits the inherent multi-scale, pyramidal hierarchy of deep convolutional networks to construct feature pyramids with marginal extra cost and achieves state-of-the-art single-model results on the COCO detection benchmark without bells and whistles.
Proceedings ArticleDOI

Mask R-CNN

TL;DR: This work presents a conceptually simple, flexible, and general framework for object instance segmentation, which extends Faster R-CNN by adding a branch for predicting an object mask in parallel with the existing branch for bounding box recognition.

Automatic differentiation in PyTorch

TL;DR: An automatic differentiation module of PyTorch is described — a library designed to enable rapid research on machine learning models that focuses on differentiation of purely imperative programs, with a focus on extensibility and low overhead.
Related Papers (5)