scispace - formally typeset
Open AccessProceedings Article

Dialog without Dialog Data: Learning Visual Dialog Agents from VQA Data.

TLDR
The authors developed a model that adapts to new tasks without language level supervision by factorizing intention and language, which minimizes linguistic drift after fine-tuning for new tasks and maintains language quality.
Abstract
Can we develop visually grounded dialog agents that can efficiently adapt to new tasks without forgetting how to talk to people? Such agents could leverage a larger variety of existing data to generalize to new tasks, minimizing expensive data collection and annotation. In this work, we study a setting we call "Dialog without Dialog", which requires agents to develop visually grounded dialog models that can adapt to new tasks without language level supervision. By factorizing intention and language, our model minimizes linguistic drift after fine-tuning for new tasks. We present qualitative results, automated metrics, and human studies that all show our model can adapt to new tasks and maintain language quality. Baselines either fail to perform well at new tasks or experience language drift, becoming unintelligible to humans. Code has been made available at this https URL

read more

Content maybe subject to copyright    Report

Citations
More filters
Journal ArticleDOI

Convention: a philosophical study

Robert Kirk
- 01 May 1970 - 
Proceedings Article

On Emergent Communication in Competitive Multi-Agent Teams

TL;DR: In this paper, the authors investigate whether competition for performance from an external, similar agent team could act as a social influence that encourages multi-agent populations to develop better communication protocols for improved performance, compositionality, and convergence speed.
References
More filters
Book ChapterDOI

Microsoft COCO: Common Objects in Context

TL;DR: A new dataset with the goal of advancing the state-of-the-art in object recognition by placing the question of object recognition in the context of the broader question of scene understanding by gathering images of complex everyday scenes containing common objects in their natural context.
Journal ArticleDOI

Faster R-CNN: Towards Real-Time Object Detection with Region Proposal Networks

TL;DR: This work introduces a Region Proposal Network (RPN) that shares full-image convolutional features with the detection network, thus enabling nearly cost-free region proposals and further merge RPN and Fast R-CNN into a single network by sharing their convolutionAL features.
Posted Content

Auto-Encoding Variational Bayes

TL;DR: In this paper, a stochastic variational inference and learning algorithm was proposed for directed probabilistic models with intractable posterior distributions and large datasets, which scales to large datasets.

The Caltech-UCSD Birds-200-2011 Dataset

TL;DR: CUB-200-2011 as mentioned in this paper is an extended version of CUB200, which roughly doubles the number of images per category and adds new part localization annotations, annotated with bounding boxes, part locations, and at-ribute labels.
Proceedings ArticleDOI

Bottom-Up and Top-Down Attention for Image Captioning and Visual Question Answering

TL;DR: In this paper, a bottom-up and top-down attention mechanism was proposed to enable attention to be calculated at the level of objects and other salient image regions, which achieved state-of-the-art results on the MSCOCO test server.
Related Papers (5)