D
Dhruv Batra
Researcher at Georgia Institute of Technology
Publications - 272
Citations - 43803
Dhruv Batra is an academic researcher from Georgia Institute of Technology. The author has contributed to research in topics: Question answering & Dialog box. The author has an hindex of 69, co-authored 272 publications receiving 29938 citations. Previous affiliations of Dhruv Batra include Facebook & Toyota Technological Institute at Chicago.
Papers
More filters
Posted Content
Sequential Latent Spaces for Modeling the Intention During Diverse Image Captioning
TL;DR: This work proposes Seq-CVAE which learns a latent space for every word which encourages this temporal latent space to capture the 'intention' about how to complete the sentence by mimicking a representation which summarizes the future.
Proceedings ArticleDOI
Resolving Language and Vision Ambiguities Together: Joint Segmentation & Prepositional Attachment Resolution in Captioned Scenes
Gordon Christie,Ankit Laddha,Aishwarya Agrawal,Stanislaw Antol,Yash Goyal,Kevin Kochersberger,Dhruv Batra +6 more
TL;DR: This paper presented an approach to simultaneously perform semantic segmentation and prepositional phrase attachment resolution for captioned images, which significantly outperforms the Stanford Parser (De Marneffe et al., 2006) by 17.91% (28.69% relative) and 12.83% (25.28%) in two different experiments.
Posted Content
CloudCV: Large Scale Distributed Computer Vision as a Cloud Service
Harsh Agrawal,Clint Solomon Mathialagan,Yash Goyal,Neelima Chavali,Prakriti Banik,Akrit Mohapatra,Ahmed A. A. Osman,Dhruv Batra +7 more
TL;DR: CloudCV as discussed by the authors is a comprehensive system to provide access to state-of-the-art distributed computer vision algorithms as a cloud service through a Web Interface and APIs. But it does not address the same algorithmic, logistical, and infrastructural problems.
Proceedings ArticleDOI
Embodied Multimodal Multitask Learning
TL;DR: This paper proposes a multitask model which facilitates knowledge transfer across tasks by disentangling the knowledge of words and visual attributes in the intermediate representations and shows that this disentanglement of representations makes the model modular and interpretable which allows for transfer to instructions containing new concepts.
Posted Content
Multi-Target Embodied Question Answering
TL;DR: In this article, a modular architecture composed of a program generator, a controller, a navigator, and a VQA module is proposed to answer questions with multiple targets in them, such as "Is the dresser in the bedroom bigger than the oven in the kitchen?".