Dialog without Dialog Data: Learning Visual Dialog Agents from VQA Data.

Open AccessProceedings Article

Dialog without Dialog Data: Learning Visual Dialog Agents from VQA Data.

- Vol. 33, pp 19988-19999

TLDR

The authors developed a model that adapts to new tasks without language level supervision by factorizing intention and language, which minimizes linguistic drift after fine-tuning for new tasks and maintains language quality.

Abstract:

Can we develop visually grounded dialog agents that can efficiently adapt to new tasks without forgetting how to talk to people? Such agents could leverage a larger variety of existing data to generalize to new tasks, minimizing expensive data collection and annotation. In this work, we study a setting we call "Dialog without Dialog", which requires agents to develop visually grounded dialog models that can adapt to new tasks without language level supervision. By factorizing intention and language, our model minimizes linguistic drift after fine-tuning for new tasks. We present qualitative results, automated metrics, and human studies that all show our model can adapt to new tasks and maintain language quality. Baselines either fail to perform well at new tasks or experience language drift, becoming unintelligible to humans. Code has been made available at this https URL

Citations

PDF

Open Access

More filters

Journal ArticleDOI

Convention: a philosophical study

Robert Kirk

- 01 May 1970 -

Philosophical Books

Proceedings Article

On Emergent Communication in Competitive Multi-Agent Teams

Paul Pu Liang, +4 more

TL;DR: In this paper, the authors investigate whether competition for performance from an external, similar agent team could act as a social influence that encourages multi-agent populations to develop better communication protocols for improved performance, compositionality, and convergence speed.

...read moreread less

References

PDF

Open Access

More filters

Book ChapterDOI

Microsoft COCO: Common Objects in Context

Tsung-Yi Lin, +7 more

TL;DR: A new dataset with the goal of advancing the state-of-the-art in object recognition by placing the question of object recognition in the context of the broader question of scene understanding by gathering images of complex everyday scenes containing common objects in their natural context.

...read moreread less

Journal ArticleDOI

Faster R-CNN: Towards Real-Time Object Detection with Region Proposal Networks

Shaoqing Ren, +3 more

- 01 Jun 2017 -

IEEE Transactions on Pattern Analysis an...

TL;DR: This work introduces a Region Proposal Network (RPN) that shares full-image convolutional features with the detection network, thus enabling nearly cost-free region proposals and further merge RPN and Fast R-CNN into a single network by sharing their convolutionAL features.

...read moreread less

Posted Content

Auto-Encoding Variational Bayes

Diederik P. Kingma, +1 more

- 20 Dec 2013 -

arXiv: Machine Learning

TL;DR: In this paper, a stochastic variational inference and learning algorithm was proposed for directed probabilistic models with intractable posterior distributions and large datasets, which scales to large datasets.

...read moreread less

The Caltech-UCSD Birds-200-2011 Dataset

Catherine Wah, +4 more

TL;DR: CUB-200-2011 as mentioned in this paper is an extended version of CUB200, which roughly doubles the number of images per category and adds new part localization annotations, annotated with bounding boxes, part locations, and at-ribute labels.

...read moreread less

Proceedings ArticleDOI

Bottom-Up and Top-Down Attention for Image Captioning and Visual Question Answering

Peter Anderson, +6 more

TL;DR: In this paper, a bottom-up and top-down attention mechanism was proposed to enable attention to be calculated at the level of objects and other salient image regions, which achieved state-of-the-art results on the MSCOCO test server.

...read moreread less