D
Dhruv Batra
Researcher at Georgia Institute of Technology
Publications - 272
Citations - 43803
Dhruv Batra is an academic researcher from Georgia Institute of Technology. The author has contributed to research in topics: Question answering & Dialog box. The author has an hindex of 69, co-authored 272 publications receiving 29938 citations. Previous affiliations of Dhruv Batra include Facebook & Toyota Technological Institute at Chicago.
Papers
More filters
Proceedings ArticleDOI
Grad-CAM: Visual Explanations from Deep Networks via Gradient-Based Localization
Ramprasaath R. Selvaraju,Michael Cogswell,Abhishek Das,Ramakrishna Vedantam,Devi Parikh,Dhruv Batra +5 more
TL;DR: This work combines existing fine-grained visualizations to create a high-resolution class-discriminative visualization, Guided Grad-CAM, and applies it to image classification, image captioning, and visual question answering (VQA) models, including ResNet-based architectures.
Proceedings ArticleDOI
VQA: Visual Question Answering
Stanislaw Antol,Aishwarya Agrawal,Jiasen Lu,Margaret Mitchell,Dhruv Batra,C. Lawrence Zitnick,Devi Parikh +6 more
TL;DR: The task of free-form and open-ended Visual Question Answering (VQA) is proposed, given an image and a natural language question about the image, the task is to provide an accurate natural language answer.
Posted Content
VQA: Visual Question Answering
Aishwarya Agrawal,Jiasen Lu,Stanislaw Antol,Margaret Mitchell,C. Lawrence Zitnick,Dhruv Batra,Devi Parikh +6 more
TL;DR: The task of free-form and open-ended Visual Question Answering (VQA) is proposed, given an image and a natural language question about the image, the task is to provide an accurate natural language answer.
Proceedings ArticleDOI
Making the V in VQA Matter: Elevating the Role of Image Understanding in Visual Question Answering
TL;DR: The authors balance the VQA dataset by collecting complementary images such that every question in the balanced dataset is associated with not just a single image, but rather a pair of similar images that result in two different answers to the same question.
Journal ArticleDOI
Grad-CAM: Visual Explanations from Deep Networks via Gradient-Based Localization
Ramprasaath R. Selvaraju,Michael Cogswell,Abhishek Das,Ramakrishna Vedantam,Devi Parikh,Devi Parikh,Dhruv Batra,Dhruv Batra +7 more
TL;DR: Grad-CAM as mentioned in this paper uses the gradients of any target concept (e.g., a dog in a classification network or a sequence of words in captioning network) flowing into the final convolutional layer to produce a coarse localization map highlighting the important regions in the image for predicting the concept.