Dhruv Batra

Researcher at Georgia Institute of Technology

Publications - 272

Citations - 43803

Dhruv Batra is an academic researcher from Georgia Institute of Technology. The author has contributed to research in topics: Question answering & Dialog box. The author has an hindex of 69, co-authored 272 publications receiving 29938 citations. Previous affiliations of Dhruv Batra include Facebook & Toyota Technological Institute at Chicago.

Papers

PDF

Open Access

More filters

Posted Content

Question Relevance in VQA: Identifying Non-Visual And False-Premise Questions

Arijit Ray, +4 more

- 21 Jun 2016 -

arXiv: Computer Vision and Pattern Recog...

TL;DR: These approaches, based on LSTM-RNNs, VQA model uncertainty, and caption-question similarity, are able to outperform strong baselines on both relevance tasks and are shown to be more intelligent, reasonable, and human-like than previous approaches.

...read moreread less

Posted Content

EvalAI: Towards Better Evaluation Systems for AI Agents.

Deshraj Yadav, +8 more

- 10 Feb 2019 -

arXiv: Artificial Intelligence

TL;DR: EvalAI is built to provide a scalable solution to the research community to fulfill the critical need of evaluating machine learning models and agents acting in an environment against annotations or with a human-in-the-loop.

...read moreread less

Posted Content

SplitNet: Sim2Sim and Task2Task Transfer for Embodied Visual Navigation

Daniel Gordon, +4 more

- 18 May 2019 -

arXiv: Computer Vision and Pattern Recog...

TL;DR: SplitNet is proposed, a method for decoupling visual perception and policy learning by incorporating auxiliary tasks and selective learning of portions of the model that explicitly decompose the learning objectives for visual navigation into perceiving the world and acting on that perception.

...read moreread less

Proceedings ArticleDOI

Bidirectional Beam Search: Forward-Backward Inference in Neural Sequence Models for Fill-in-the-Blank Image Captioning

Qing Sun, +2 more

TL;DR: This work develops the first approximate inference algorithm for 1-Best (and M-Best) decoding in bidirectional neural sequence models by extending Beam Search to reason about both forward and backward time dependencies, and introduces a novel Fill-in-the-Blank Image Captioning task which requires reasoning about both past and future sentence structure to reconstruct sensible image descriptions.

...read moreread less

Book ChapterDOI

Beyond the Nav-Graph: Vision-and-Language Navigation in Continuous Environments

Jacob Krantz, +4 more

TL;DR: This paper developed a language-guided navigation task set in a continuous 3D environment where agents must execute low-level actions to follow natural language navigation directions, which lifted a number of assumptions implicit in prior work that represents environments as a sparse graph of panoramas with edges corresponding to navigability.

...read moreread less

Collapse