D
Dhruv Batra
Researcher at Georgia Institute of Technology
Publications - 272
Citations - 43803
Dhruv Batra is an academic researcher from Georgia Institute of Technology. The author has contributed to research in topics: Question answering & Dialog box. The author has an hindex of 69, co-authored 272 publications receiving 29938 citations. Previous affiliations of Dhruv Batra include Facebook & Toyota Technological Institute at Chicago.
Papers
More filters
Proceedings ArticleDOI
Sequential Latent Spaces for Modeling the Intention During Diverse Image Captioning
TL;DR: This paper proposed Seq-CVAE which learns a latent space for every word to capture the "intention" about how to complete the sentence by mimicking a representation which summarizes the future.
Proceedings ArticleDOI
Inference for order reduction in Markov random fields
TL;DR: A new algorithm called Order Reduction Inference (ORI) is introduced that searches over a space of order reduction methods to minimize the difficulty of the resultant pairwise inference problem.
Book ChapterDOI
Spatially Aware Multimodal Transformers for TextVQA
Yash Kant,Dhruv Batra,Dhruv Batra,Peter Anderson,Alexander G. Schwing,Devi Parikh,Devi Parikh,Jiasen Lu,Harsh Agrawal +8 more
TL;DR: The authors proposed a spatially aware self-attention layer, where each visual entity only looks at neighboring entities defined by a spatial graph, and each head in the multi-head selfattention layers focuses on a different subset of relations.
Posted Content
Sim-to-Real Transfer for Vision-and-Language Navigation
Peter Anderson,Ayush Shrivastava,Joanne Truong,Arjun Majumdar,Devi Parikh,Dhruv Batra,Stefan Lee +6 more
TL;DR: To bridge the gap between the high-level discrete action space learned by the VLN agent, and the robot's low-level continuous action space, a subgoal model is proposed to identify nearby waypoints, and domain randomization is used to mitigate visual domain differences.
Proceedings ArticleDOI
Embodied Amodal Recognition: Learning to Move to Perceive Objects
TL;DR: Experimental results show that agents with embodiment (movement) achieve better visual recognition performance than passive ones and in order to improve visual recognition abilities, agents can learn strategic paths that are different from shortest paths.