scispace - formally typeset
D

Dhruv Batra

Researcher at Georgia Institute of Technology

Publications -  272
Citations -  43803

Dhruv Batra is an academic researcher from Georgia Institute of Technology. The author has contributed to research in topics: Question answering & Dialog box. The author has an hindex of 69, co-authored 272 publications receiving 29938 citations. Previous affiliations of Dhruv Batra include Facebook & Toyota Technological Institute at Chicago.

Papers
More filters
Posted Content

Sequential Latent Spaces for Modeling the Intention During Diverse Image Captioning

TL;DR: This work proposes Seq-CVAE which learns a latent space for every word which encourages this temporal latent space to capture the 'intention' about how to complete the sentence by mimicking a representation which summarizes the future.
Proceedings ArticleDOI

Resolving Language and Vision Ambiguities Together: Joint Segmentation & Prepositional Attachment Resolution in Captioned Scenes

TL;DR: This paper presented an approach to simultaneously perform semantic segmentation and prepositional phrase attachment resolution for captioned images, which significantly outperforms the Stanford Parser (De Marneffe et al., 2006) by 17.91% (28.69% relative) and 12.83% (25.28%) in two different experiments.
Posted Content

CloudCV: Large Scale Distributed Computer Vision as a Cloud Service

TL;DR: CloudCV as discussed by the authors is a comprehensive system to provide access to state-of-the-art distributed computer vision algorithms as a cloud service through a Web Interface and APIs. But it does not address the same algorithmic, logistical, and infrastructural problems.
Proceedings ArticleDOI

Embodied Multimodal Multitask Learning

TL;DR: This paper proposes a multitask model which facilitates knowledge transfer across tasks by disentangling the knowledge of words and visual attributes in the intermediate representations and shows that this disentanglement of representations makes the model modular and interpretable which allows for transfer to instructions containing new concepts.
Posted Content

Multi-Target Embodied Question Answering

TL;DR: In this article, a modular architecture composed of a program generator, a controller, a navigator, and a VQA module is proposed to answer questions with multiple targets in them, such as "Is the dresser in the bedroom bigger than the oven in the kitchen?".