scispace - formally typeset
D

Dhruv Batra

Researcher at Georgia Institute of Technology

Publications -  272
Citations -  43803

Dhruv Batra is an academic researcher from Georgia Institute of Technology. The author has contributed to research in topics: Question answering & Dialog box. The author has an hindex of 69, co-authored 272 publications receiving 29938 citations. Previous affiliations of Dhruv Batra include Facebook & Toyota Technological Institute at Chicago.

Papers
More filters
Proceedings ArticleDOI

Embodied Question Answering in Photorealistic Environments With Point Cloud Perception

TL;DR: In this article, a large-scale navigation task for embodied question answering in photo-realistic environments (Matterport 3D) is presented, where 3D point clouds, RGB images, or their combination are used.
Proceedings Article

A Systematic Exploration of Diversity in Machine Translation

TL;DR: It is found that diversity can improve performance on these tasks, especially for sentences that are difficult for MT.
Proceedings Article

Best of Both Worlds: Transferring Knowledge from Discriminative Learning to a Generative Visual Dialog Model

TL;DR: Lu et al. as discussed by the authors proposed an end-to-end trainable generative visual dialog model, where G receives gradients from discriminative dialog models as a perceptual (not adversarial) loss of the sequence sampled from G. They also introduced a stronger encoder for visual dialog, and employ a self-attention mechanism for answer encoding along with a metric learning loss to aid D in better capturing semantic similarities in answer responses.
Posted Content

Beyond the Nav-Graph: Vision-and-Language Navigation in Continuous Environments

TL;DR: A language-guided navigation task set in a continuous 3D environment where agents must execute low-level actions to follow natural language navigation directions is developed, suggesting that performance in prior `navigation-graph' settings may be inflated by the strong implicit assumptions.
Proceedings ArticleDOI

Human Attention in Visual Question Answering: Do Humans and Deep Networks look at the same regions?

TL;DR: In this article, the VQA-HAT (Human ATtention) dataset was introduced to evaluate attention maps generated by state-of-the-art VQAs against human attention.