scispace - formally typeset
Open AccessDissertationDOI

Continually improving grounded natural language understanding through human-robot dialog

Reads0
Chats0
TLDR
This work presents an end-to-end pipeline for translating natural language commands to discrete robot actions, and uses clarification dialogs to jointly improve language parsing and concept grounding.
Abstract
Natural language understanding for robotics can require substantial domainand platform-specific engineering. For example, for mobile robots to pick-and-place objects in an environment to satisfy human commands, we can specify the language humans use to issue such commands, and connect concept words like red can to physical object properties. One way to alleviate this engineering for a new domain is to enable robots in human environments to adapt dynamically— continually learning new language constructions and perceptual concepts. In this work, we present an end-to-end pipeline for translating natural language commands to discrete robot actions, and use clarification dialogs to jointly improve language parsing and concept grounding. We train and evaluate this agent in a virtual setting on Amazon Mechanical Turk, and we transfer the learned agent to a physical robot platform to demonstrate it in the real world.

read more

Citations
More filters

Vision-and-Dialog Navigation

TL;DR: In this paper, the authors introduce Cooperative Vision-and-Dialog Navigation (CVDN), a dataset of over 2k embodied, human-human dialogs situated in simulated, photorealistic home environments.
Journal ArticleDOI

Just Ask:An Interactive Learning Framework for Vision and Language Navigation

TL;DR: This work proposes an interactive learning framework to endow the agent with the ability to ask for users' help in ambiguous situations and designs a continual learning strategy, which can be viewed as a data augmentation method, for the agent to improve further utilizing its interaction history with a human.
Journal ArticleDOI

Jointly Improving Parsing and Perception for Natural Language Commands through Human-Robot Dialog

TL;DR: Methods for using human-robot dialog to improve language understanding for a mobile robot agent that parses natural language to underlying semantic meanings and uses robotic sensors to create multi-modal models of perceptual concepts like red and heavy are presented.
Proceedings ArticleDOI

Enabling Robots to Understand Incomplete Natural Language Instructions Using Commonsense Reasoning

TL;DR: Language-Model-based Commonsense Reasoning (LMCR), a new method which enables a robot to listen to a natural language instruction from a human, observe the environment around it, and automatically fill in information missing from the instruction using environmental context and a new commonsense reasoning approach.
Posted Content

Spatio-Temporal Scene Graphs for Video Dialog.

TL;DR: A novel spatio-temporal scene graph representation (STSGR) modeling fine-grained information flows within videos and produces the correct answer to a question about a given video recursively using a novel semantics-controlled multi-head shuffled transformer.
References
More filters
Proceedings Article

ImageNet Classification with Deep Convolutional Neural Networks

TL;DR: The state-of-the-art performance of CNNs was achieved by Deep Convolutional Neural Networks (DCNNs) as discussed by the authors, which consists of five convolutional layers, some of which are followed by max-pooling layers, and three fully-connected layers with a final 1000-way softmax.
Proceedings Article

Very Deep Convolutional Networks for Large-Scale Image Recognition

TL;DR: In this paper, the authors investigated the effect of the convolutional network depth on its accuracy in the large-scale image recognition setting and showed that a significant improvement on the prior-art configurations can be achieved by pushing the depth to 16-19 layers.
Proceedings ArticleDOI

ImageNet: A large-scale hierarchical image database

TL;DR: A new database called “ImageNet” is introduced, a large-scale ontology of images built upon the backbone of the WordNet structure, much larger in scale and diversity and much more accurate than the current image datasets.
Proceedings Article

Distributed Representations of Words and Phrases and their Compositionality

TL;DR: This paper presents a simple method for finding phrases in text, and shows that learning good vector representations for millions of phrases is possible and describes a simple alternative to the hierarchical softmax called negative sampling.
Journal ArticleDOI

WordNet : an electronic lexical database

Christiane Fellbaum
- 01 Sep 2000 - 
TL;DR: The lexical database: nouns in WordNet, Katherine J. Miller a semantic network of English verbs, and applications of WordNet: building semantic concordances are presented.
Related Papers (5)