scispace - formally typeset
D

Dhruv Batra

Researcher at Georgia Institute of Technology

Publications -  272
Citations -  43803

Dhruv Batra is an academic researcher from Georgia Institute of Technology. The author has contributed to research in topics: Question answering & Dialog box. The author has an hindex of 69, co-authored 272 publications receiving 29938 citations. Previous affiliations of Dhruv Batra include Facebook & Toyota Technological Institute at Chicago.

Papers
More filters
Posted Content

Improving Vision-and-Language Navigation with Image-Text Pairs from the Web

TL;DR: VLN-BERT, a visiolinguistic transformer-based model for scoring the compatibility between an instruction ('...stop at the brown sofa') and a sequence of panoramic RGB images captured by the agent, is developed.
Posted Content

Talk the Walk: Navigating New York City through Grounded Dialogue

TL;DR: This work focuses on the task of tourist localization and develops the novel Masked Attention for Spatial Convolutions (MASC) mechanism that allows for grounding tourist utterances into the guide's map, and shows it yields significant improvements for both emergent and natural language communication.
Proceedings ArticleDOI

End-to-end Audio Visual Scene-aware Dialog Using Multimodal Attention-based Video Features

TL;DR: This paper introduces a new data set of dialogs about videos of human behaviors, as well as an end-to-end Audio Visual Scene-Aware Dialog (AVSD) model, trained using thisnew data set, that generates responses in a dialog about a video.
Proceedings ArticleDOI

Audio Visual Scene-Aware Dialog

TL;DR: The authors introduced the Audio Visual Scene-Aware Dialog (AVSD) dataset, which contains a dialog about the video, plus a final summary of the video by one of the dialog participants.
Posted Content

Rearrangement: A Challenge for Embodied AI.

TL;DR: A framework for research and evaluation in Embodied AI is described, based on a canonical task: Rearrangement, that can focus the development of new techniques and serve as a source of trained models that can be transferred to other settings.