Continually improving grounded natural language understanding through human-robot dialog

doi:10.15781/T2902011J

Open AccessDissertationDOI

Continually improving grounded natural language understanding through human-robot dialog

Jesse Thomason

Chats0

TLDR

This work presents an end-to-end pipeline for translating natural language commands to discrete robot actions, and uses clarification dialogs to jointly improve language parsing and concept grounding.

Abstract:

Natural language understanding for robotics can require substantial domainand platform-specific engineering. For example, for mobile robots to pick-and-place objects in an environment to satisfy human commands, we can specify the language humans use to issue such commands, and connect concept words like red can to physical object properties. One way to alleviate this engineering for a new domain is to enable robots in human environments to adapt dynamically— continually learning new language constructions and perceptual concepts. In this work, we present an end-to-end pipeline for translating natural language commands to discrete robot actions, and use clarification dialogs to jointly improve language parsing and concept grounding. We train and evaluate this agent in a virtual setting on Amazon Mechanical Turk, and we transfer the learned agent to a physical robot platform to demonstrate it in the real world.

Citations

PDF

Open Access

More filters

Vision-and-Dialog Navigation

Jesse Thomason, +3 more

TL;DR: In this paper, the authors introduce Cooperative Vision-and-Dialog Navigation (CVDN), a dataset of over 2k embodied, human-human dialogs situated in simulated, photorealistic home environments.

...read moreread less

Journal ArticleDOI

Just Ask:An Interactive Learning Framework for Vision and Language Navigation

Ta-Chung Chi, +4 more

TL;DR: This work proposes an interactive learning framework to endow the agent with the ability to ask for users' help in ambiguous situations and designs a continual learning strategy, which can be viewed as a data augmentation method, for the agent to improve further utilizing its interaction history with a human.

...read moreread less

Journal ArticleDOI

Jointly Improving Parsing and Perception for Natural Language Commands through Human-Robot Dialog

Jesse Thomason, +8 more

- 26 Feb 2020 -

Journal of Artificial Intelligence Resea...

TL;DR: Methods for using human-robot dialog to improve language understanding for a mobile robot agent that parses natural language to underlying semantic meanings and uses robotic sensors to create multi-modal models of perceptual concepts like red and heavy are presented.

...read moreread less

Proceedings ArticleDOI

Enabling Robots to Understand Incomplete Natural Language Instructions Using Commonsense Reasoning

Haonan Chen, +4 more

TL;DR: Language-Model-based Commonsense Reasoning (LMCR), a new method which enables a robot to listen to a natural language instruction from a human, observe the environment around it, and automatically fill in information missing from the instruction using environmental context and a new commonsense reasoning approach.

...read moreread less

Posted Content

Spatio-Temporal Scene Graphs for Video Dialog.

Shijie Geng, +4 more

- 08 Jul 2020 -

arXiv: Computer Vision and Pattern Recog...

TL;DR: A novel spatio-temporal scene graph representation (STSGR) modeling fine-grained information flows within videos and produces the correct answer to a question about a given video recursively using a novel semantics-controlled multi-head shuffled transformer.

...read moreread less

Collapse

References

PDF

Open Access

More filters

Proceedings Article

ImageNet Classification with Deep Convolutional Neural Networks

Alex Krizhevsky, +2 more

TL;DR: The state-of-the-art performance of CNNs was achieved by Deep Convolutional Neural Networks (DCNNs) as discussed by the authors, which consists of five convolutional layers, some of which are followed by max-pooling layers, and three fully-connected layers with a final 1000-way softmax.

...read moreread less

Proceedings Article

Very Deep Convolutional Networks for Large-Scale Image Recognition

Karen Simonyan, +1 more

TL;DR: In this paper, the authors investigated the effect of the convolutional network depth on its accuracy in the large-scale image recognition setting and showed that a significant improvement on the prior-art configurations can be achieved by pushing the depth to 16-19 layers.

...read moreread less

Proceedings ArticleDOI

ImageNet: A large-scale hierarchical image database

Jia Deng, +5 more

TL;DR: A new database called “ImageNet” is introduced, a large-scale ontology of images built upon the backbone of the WordNet structure, much larger in scale and diversity and much more accurate than the current image datasets.

...read moreread less

Proceedings Article

Distributed Representations of Words and Phrases and their Compositionality

Tomas Mikolov, +4 more

TL;DR: This paper presents a simple method for finding phrases in text, and shows that learning good vector representations for millions of phrases is possible and describes a simple alternative to the hierarchical softmax called negative sampling.

...read moreread less

Journal ArticleDOI

WordNet : an electronic lexical database

Christiane Fellbaum

- 01 Sep 2000 -

Language

TL;DR: The lexical database: nouns in WordNet, Katherine J. Miller a semantic network of English verbs, and applications of WordNet: building semantic concordances are presented.

...read moreread less

Collapse

arXiv: Computation and Language

Situated Open World Reference Resolution for Human-Robot Dialogue

Tom Williams, +3 more

Conversational Interfaces: Past and Present

Michael F. McTear, +2 more

Continually improving grounded natural language understanding through human-robot dialog

Citations

Vision-and-Dialog Navigation

Just Ask:An Interactive Learning Framework for Vision and Language Navigation

Jointly Improving Parsing and Perception for Natural Language Commands through Human-Robot Dialog

Enabling Robots to Understand Incomplete Natural Language Instructions Using Commonsense Reasoning

Spatio-Temporal Scene Graphs for Video Dialog.

References

ImageNet Classification with Deep Convolutional Neural Networks

Very Deep Convolutional Networks for Large-Scale Image Recognition

ImageNet: A large-scale hierarchical image database

Distributed Representations of Words and Phrases and their Compositionality

WordNet : an electronic lexical database

Related Papers (5)

Natural Interaction with Robots, Knowbots and Smartphones: Putting Spoken Dialog Systems into Practice

Integrating Pointing Gestures into a Spanish-spoken Dialog System for Conversational Service Robots.

End-to-end Conversation Modeling Track in DSTC6

Situated Open World Reference Resolution for Human-Robot Dialogue

Conversational Interfaces: Past and Present