scispace - formally typeset
D

Dhruv Batra

Researcher at Georgia Institute of Technology

Publications -  272
Citations -  43803

Dhruv Batra is an academic researcher from Georgia Institute of Technology. The author has contributed to research in topics: Question answering & Dialog box. The author has an hindex of 69, co-authored 272 publications receiving 29938 citations. Previous affiliations of Dhruv Batra include Facebook & Toyota Technological Institute at Chicago.

Papers
More filters
Posted Content

Question Relevance in VQA: Identifying Non-Visual And False-Premise Questions

TL;DR: These approaches, based on LSTM-RNNs, VQA model uncertainty, and caption-question similarity, are able to outperform strong baselines on both relevance tasks and are shown to be more intelligent, reasonable, and human-like than previous approaches.
Posted Content

EvalAI: Towards Better Evaluation Systems for AI Agents.

TL;DR: EvalAI is built to provide a scalable solution to the research community to fulfill the critical need of evaluating machine learning models and agents acting in an environment against annotations or with a human-in-the-loop.
Posted Content

SplitNet: Sim2Sim and Task2Task Transfer for Embodied Visual Navigation

TL;DR: SplitNet is proposed, a method for decoupling visual perception and policy learning by incorporating auxiliary tasks and selective learning of portions of the model that explicitly decompose the learning objectives for visual navigation into perceiving the world and acting on that perception.
Proceedings ArticleDOI

Bidirectional Beam Search: Forward-Backward Inference in Neural Sequence Models for Fill-in-the-Blank Image Captioning

TL;DR: This work develops the first approximate inference algorithm for 1-Best (and M-Best) decoding in bidirectional neural sequence models by extending Beam Search to reason about both forward and backward time dependencies, and introduces a novel Fill-in-the-Blank Image Captioning task which requires reasoning about both past and future sentence structure to reconstruct sensible image descriptions.
Book ChapterDOI

Beyond the Nav-Graph: Vision-and-Language Navigation in Continuous Environments

TL;DR: This paper developed a language-guided navigation task set in a continuous 3D environment where agents must execute low-level actions to follow natural language navigation directions, which lifted a number of assumptions implicit in prior work that represents environments as a sparse graph of panoramas with edges corresponding to navigability.