Dhruv Batra

Researcher at Georgia Institute of Technology

Publications - 272

Citations - 43803

Dhruv Batra is an academic researcher from Georgia Institute of Technology. The author has contributed to research in topics: Question answering & Dialog box. The author has an hindex of 69, co-authored 272 publications receiving 29938 citations. Previous affiliations of Dhruv Batra include Facebook & Toyota Technological Institute at Chicago.

Papers

PDF

Open Access

More filters

Posted Content

Sequential Latent Spaces for Modeling the Intention During Diverse Image Captioning

Jyoti Aneja, +3 more

- 22 Aug 2019 -

arXiv: Computer Vision and Pattern Recog...

TL;DR: This work proposes Seq-CVAE which learns a latent space for every word which encourages this temporal latent space to capture the 'intention' about how to complete the sentence by mimicking a representation which summarizes the future.

...read moreread less

Proceedings ArticleDOI

Resolving Language and Vision Ambiguities Together: Joint Segmentation & Prepositional Attachment Resolution in Captioned Scenes

Gordon Christie, +6 more

TL;DR: This paper presented an approach to simultaneously perform semantic segmentation and prepositional phrase attachment resolution for captioned images, which significantly outperforms the Stanford Parser (De Marneffe et al., 2006) by 17.91% (28.69% relative) and 12.83% (25.28%) in two different experiments.

...read moreread less

Posted Content

CloudCV: Large Scale Distributed Computer Vision as a Cloud Service

Harsh Agrawal, +7 more

- 12 Jun 2015 -

arXiv: Computer Vision and Pattern Recog...

TL;DR: CloudCV as discussed by the authors is a comprehensive system to provide access to state-of-the-art distributed computer vision algorithms as a cloud service through a Web Interface and APIs. But it does not address the same algorithmic, logistical, and infrastructural problems.

...read moreread less

Proceedings ArticleDOI

Embodied Multimodal Multitask Learning

Devendra Singh Chaplot, +4 more

TL;DR: This paper proposes a multitask model which facilitates knowledge transfer across tasks by disentangling the knowledge of words and visual attributes in the intermediate representations and shows that this disentanglement of representations makes the model modular and interpretable which allows for transfer to instructions containing new concepts.

...read moreread less

Posted Content

Multi-Target Embodied Question Answering

Licheng Yu, +5 more

- 09 Apr 2019 -

arXiv: Computer Vision and Pattern Recog...

TL;DR: In this article, a modular architecture composed of a program generator, a controller, a navigator, and a VQA module is proposed to answer questions with multiple targets in them, such as "Is the dresser in the bedroom bigger than the oven in the kitchen?".

...read moreread less

Collapse