Dhruv Batra

Researcher at Georgia Institute of Technology

Publications - 272

Citations - 43803

Dhruv Batra is an academic researcher from Georgia Institute of Technology. The author has contributed to research in topics: Question answering & Dialog box. The author has an hindex of 69, co-authored 272 publications receiving 29938 citations. Previous affiliations of Dhruv Batra include Facebook & Toyota Technological Institute at Chicago.

Papers

PDF

Open Access

More filters

Proceedings ArticleDOI

Grad-CAM: Visual Explanations from Deep Networks via Gradient-Based Localization

Ramprasaath R. Selvaraju, +5 more

TL;DR: This work combines existing fine-grained visualizations to create a high-resolution class-discriminative visualization, Guided Grad-CAM, and applies it to image classification, image captioning, and visual question answering (VQA) models, including ResNet-based architectures.

...read moreread less

Proceedings ArticleDOI

VQA: Visual Question Answering

Stanislaw Antol, +6 more

TL;DR: The task of free-form and open-ended Visual Question Answering (VQA) is proposed, given an image and a natural language question about the image, the task is to provide an accurate natural language answer.

...read moreread less

Posted Content

VQA: Visual Question Answering

Aishwarya Agrawal, +6 more

- 03 May 2015 -

arXiv: Computation and Language

...read moreread less

Proceedings ArticleDOI

Making the V in VQA Matter: Elevating the Role of Image Understanding in Visual Question Answering

Yash Goyal, +4 more

TL;DR: The authors balance the VQA dataset by collecting complementary images such that every question in the balanced dataset is associated with not just a single image, but rather a pair of similar images that result in two different answers to the same question.

...read moreread less

Journal ArticleDOI

Grad-CAM: Visual Explanations from Deep Networks via Gradient-Based Localization

Ramprasaath R. Selvaraju, +7 more

- 01 Feb 2020 -

International Journal of Computer Vision

TL;DR: Grad-CAM as mentioned in this paper uses the gradients of any target concept (e.g., a dog in a classification network or a sequence of words in captioning network) flowing into the final convolutional layer to produce a coarse localization map highlighting the important regions in the image for predicting the concept.

...read moreread less

Collapse