Building Generalizable Agents with a Realistic and Rich 3D Environment

Open AccessPosted Content

Building Generalizable Agents with a Realistic and Rich 3D Environment

Yi Wu, +3 more

- 07 Jan 2018 -

arXiv: Learning

Chats0

TLDR

This article proposed House3D, a rich, extensible and efficient environment that contains 45,622 human-designed 3D scenes of visually realistic houses, ranging from single-room studios to multi-storied houses, equipped with a diverse set of fully labeled objects, textures and scene layouts.

Abstract:

Teaching an agent to navigate in an unseen 3D environment is a challenging task, even in the event of simulated environments. To generalize to unseen environments, an agent needs to be robust to low-level variations (e.g. color, texture, object changes), and also high-level variations (e.g. layout changes of the environment). To improve overall generalization, all types of variations in the environment have to be taken under consideration via different level of data augmentation steps. To this end, we propose House3D, a rich, extensible and efficient environment that contains 45,622 human-designed 3D scenes of visually realistic houses, ranging from single-room studios to multi-storied houses, equipped with a diverse set of fully labeled 3D objects, textures and scene layouts, based on the SUNCG dataset (Song this http URL.). The diversity in House3D opens the door towards scene-level augmentation, while the label-rich nature of House3D enables us to inject pixel- & task-level augmentations such as domain randomization (Toubin et. al.) and multi-task training. Using a subset of houses in House3D, we show that reinforcement learning agents trained with an enhancement of different levels of augmentations perform much better in unseen environments than our baselines with raw RGB input by over 8% in terms of navigation success rate. House3D is publicly available at this http URL.

Citations

PDF

Open Access

More filters

Proceedings ArticleDOI

Habitat: A Platform for Embodied AI Research

Manolis Savva, +11 more

TL;DR: The comparison between learning and SLAM approaches from two recent works are revisited and evidence is found -- that learning outperforms SLAM if scaled to an order of magnitude more experience than previous investigations, and the first cross-dataset generalization experiments are conducted.

...read moreread less

Proceedings ArticleDOI

Vision-and-Language Navigation: Interpreting Visually-Grounded Navigation Instructions in Real Environments

Peter Anderson, +8 more

TL;DR: The Room-to-Room (R2R) dataset as mentioned in this paper provides a large-scale reinforcement learning environment based on real imagery for visually-grounded natural language navigation in real buildings.

...read moreread less

Posted Content

AI2-THOR: An Interactive 3D Environment for Visual AI

Eric Kolve, +5 more

- 14 Dec 2017 -

arXiv: Computer Vision and Pattern Recog...

TL;DR: AI2-THOR consists of near photo-realistic 3D indoor scenes, where AI agents can navigate in the scenes and interact with objects to perform tasks and facilitate building visually intelligent models.

...read moreread less

Proceedings ArticleDOI

Gibson Env: Real-World Perception for Embodied Agents

Fei Xia, +5 more

TL;DR: Gibson as discussed by the authors is a real-world environment for active agents to learn visual perception tasks in real-time and is based upon virtualizing real spaces, rather than artificially designed ones, and currently includes over 1400 floor spaces from 572 full buildings.

...read moreread less

Posted Content

On Evaluation of Embodied Navigation Agents

Peter Anderson, +10 more

- 18 Jul 2018 -

arXiv: Artificial Intelligence

TL;DR: The present document summarizes the consensus recommendations of a working group to study empirical methodology in navigation research and discusses different problem statements and the role of generalization, present evaluation measures, and provides standard scenarios that can be used for benchmarking.

...read moreread less

Collapse

Building Generalizable Agents with a Realistic and Rich 3D Environment

Citations

Habitat: A Platform for Embodied AI Research

Vision-and-Language Navigation: Interpreting Visually-Grounded Navigation Instructions in Real Environments

AI2-THOR: An Interactive 3D Environment for Visual AI

Gibson Env: Real-World Perception for Embodied Agents

On Evaluation of Embodied Navigation Agents

Related Papers (5)

Deep Residual Learning for Image Recognition

Target-driven visual navigation in indoor scenes using deep reinforcement learning

Semantic Scene Completion from a Single Depth Image

Asynchronous methods for deep reinforcement learning

Matterport3D: Learning from RGB-D Data in Indoor Environments