Top 11 papers published by Dzmitry Bahdanau from McGill University in 2018

Proceedings Article•

BabyAI: A Platform to Study the Sample Efficiency of Grounded Language Learning

[...]

Maxime Chevalier-Boisvert¹, Dzmitry Bahdanau¹, Salem Lahlou, Lucas Willems, Chitwan Saharia², Thien Huu Nguyen³, Yoshua Bengio¹ - Show less +3 more•Institutions (3)

Université de Montréal¹, Indian Institute of Technology Bombay², University of Oregon³

27 Sep 2018

Abstract: Allowing humans to interactively train artificial agents to understand language instructions is desirable for both practical and scientific reasons, but given the poor data efficiency of the current learning methods, this goal may require substantial research efforts. Here, we introduce the BabyAI research platform to support investigations towards including humans in the loop for grounded language learning. The BabyAI platform comprises an extensible suite of 19 levels of increasing difficulty. The levels gradually lead the agent towards acquiring a combinatorially rich synthetic language which is a proper subset of English. The platform also provides a heuristic expert agent for the purpose of simulating a human teacher. We report baseline results and estimate the amount of human involvement that would be required to train a neural network-based agent on some of the BabyAI levels. We put forward strong evidence that current deep learning methods are not yet sufficiently sample efficient when it comes to learning a language with compositional properties.

...read moreread less

140 citations

Proceedings Article•

Learning to Understand Goal Specifications by Modelling Reward

[...]

Dzmitry Bahdanau¹, Felix Hill², Jan Leike³, Edward Hughes⁴, Arian Hosseini, Pushmeet Kohli⁴, Edward Grefenstette⁵ - Show less +3 more•Institutions (5)

Université de Montréal¹, University of Washington², Australian National University³, Google⁴, Facebook⁵

27 Sep 2018

TL;DR: This article proposed a framework within which instruction-conditional RL agents are trained using rewards obtained not from the environment, but from reward models which are jointly trained from expert examples, effectively separating the representation of what instructions require from how they can be executed.

...read moreread less

Abstract: Recent work has shown that deep reinforcement-learning agents can learn to follow language-like instructions from infrequent environment rewards. However, this places on environment designers the onus of designing language-conditional reward functions which may not be easily or tractably implemented as the complexity of the environment and the language scales. To overcome this limitation, we present a framework within which instruction-conditional RL agents are trained using rewards obtained not from the environment, but from reward models which are jointly trained from expert examples. As reward models improve, they learn to accurately reward agents for completing tasks for environment configurations---and for instructions---not present amongst the expert data. This framework effectively separates the representation of what instructions require from how they can be executed. In a simple grid world, it enables an agent to learn a range of commands requiring interaction with blocks and understanding of spatial relations and underspecified abstract arrangements. We further show the method allows our agent to adapt to changes in the environment without requiring new expert examples.

...read moreread less

107 citations

Posted Content•

BabyAI: First Steps Towards Grounded Language Learning With a Human In the Loop.

[...]

Maxime Chevalier-Boisvert, Dzmitry Bahdanau, Salem Lahlou, Lucas Willems, Chitwan Saharia, Thien Huu Nguyen, Yoshua Bengio - Show less +3 more

18 Oct 2018

TL;DR: The BabyAI research platform is introduced to support investigations towards including humans in the loop for grounded language learning and puts forward strong evidence that current deep learning methods are not yet sufficiently sample efficient when it comes to learning a language with compositional properties.

...read moreread less

Abstract: Allowing humans to interactively train artificial agents to understand language instructions is desirable for both practical and scientific reasons, but given the poor data efficiency of the current learning methods, this goal may require substantial research efforts. Here, we introduce the BabyAI research platform to support investigations towards including humans in the loop for grounded language learning. The BabyAI platform comprises an extensible suite of 19 levels of increasing difficulty. The levels gradually lead the agent towards acquiring a combinatorially rich synthetic language which is a proper subset of English. The platform also provides a heuristic expert agent for the purpose of simulating a human teacher. We report baseline results and estimate the amount of human involvement that would be required to train a neural network-based agent on some of the BabyAI levels. We put forward strong evidence that current deep learning methods are not yet sufficiently sample efficient when it comes to learning a language with compositional properties.

...read moreread less

107 citations

Proceedings Article•

Systematic Generalization: What Is Required and Can It Be Learned?

[...]

Dzmitry Bahdanau¹, Shikhar Murty², Michael Noukhovitch, Thien Huu Nguyen³, Harm de Vries¹, Aaron Courville⁴ - Show less +2 more•Institutions (4)

Université de Montréal¹, Indian Institutes of Technology², University of Oregon³, Canadian Institute for Advanced Research⁴

27 Sep 2018

TL;DR: This article showed that the generalization of modular models is highly sensitive to the module layout, i.e. to how exactly the modules are connected, and furthermore investigated if modular models that generalize well could be made more end-to-end by learning their layout and parametrization.

...read moreread less

Abstract: Numerous models for grounded language understanding have been recently proposed, including (i) generic models that can be easily adapted to any given task and (ii) intuitively appealing modular models that require background knowledge to be instantiated. We compare both types of models in how much they lend themselves to a particular form of systematic generalization. Using a synthetic VQA test, we evaluate which models are capable of reasoning about all possible object pairs after training on only a small subset of them. Our findings show that the generalization of modular models is much more systematic and that it is highly sensitive to the module layout, i.e. to how exactly the modules are connected. We furthermore investigate if modular models that generalize well could be made more end-to-end by learning their layout and parametrization. We find that end-to-end methods from prior work often learn inappropriate layouts or parametrizations that do not facilitate systematic generalization. Our results suggest that, in addition to modularity, systematic generalization in language understanding may require explicit regularizers or priors.

...read moreread less

81 citations

Posted Content•

Systematic Generalization: What Is Required and Can It Be Learned?

[...]

Dzmitry Bahdanau¹, Shikhar Murty², Michael Noukhovitch, Thien Huu Nguyen³, Harm de Vries¹, Aaron Courville⁴ - Show less +2 more•Institutions (4)

Université de Montréal¹, Indian Institutes of Technology², University of Oregon³, Canadian Institute for Advanced Research⁴

30 Nov 2018-arXiv: Computation and Language

TL;DR: The findings show that the generalization of modular models is much more systematic and that it is highly sensitive to the module layout, i.e. to how exactly the modules are connected, whereas systematic generalization in language understanding may require explicit regularizers or priors.

...read moreread less

Abstract: Numerous models for grounded language understanding have been recently proposed, including (i) generic models that can be easily adapted to any given task and (ii) intuitively appealing modular models that require background knowledge to be instantiated. We compare both types of models in how much they lend themselves to a particular form of systematic generalization. Using a synthetic VQA test, we evaluate which models are capable of reasoning about all possible object pairs after training on only a small subset of them. Our findings show that the generalization of modular models is much more systematic and that it is highly sensitive to the module layout, i.e. to how exactly the modules are connected. We furthermore investigate if modular models that generalize well could be made more end-to-end by learning their layout and parametrization. We find that end-to-end methods from prior work often learn inappropriate layouts or parametrizations that do not facilitate systematic generalization. Our results suggest that, in addition to modularity, systematic generalization in language understanding may require explicit regularizers or priors.

...read moreread less

72 citations

Posted Content•

Learning to Understand Goal Specifications by Modelling Reward

[...]

Dzmitry Bahdanau¹, Felix Hill², Jan Leike³, Edward Hughes⁴, Arian Hosseini, Pushmeet Kohli⁴, Edward Grefenstette⁵ - Show less +3 more•Institutions (5)

Université de Montréal¹, University of Washington², Australian National University³, Google⁴, Facebook⁵

05 Jun 2018-arXiv: Artificial Intelligence

TL;DR: A framework within which instruction-conditional RL agents are trained using rewards obtained not from the environment, but from reward models which are jointly trained from expert examples, which allows an agent to adapt to changes in the environment without requiring new expert examples.

...read moreread less

Abstract: Recent work has shown that deep reinforcement-learning agents can learn to follow language-like instructions from infrequent environment rewards. However, this places on environment designers the onus of designing language-conditional reward functions which may not be easily or tractably implemented as the complexity of the environment and the language scales. To overcome this limitation, we present a framework within which instruction-conditional RL agents are trained using rewards obtained not from the environment, but from reward models which are jointly trained from expert examples. As reward models improve, they learn to accurately reward agents for completing tasks for environment configurations---and for instructions---not present amongst the expert data. This framework effectively separates the representation of what instructions require from how they can be executed. In a simple grid world, it enables an agent to learn a range of commands requiring interaction with blocks and understanding of spatial relations and underspecified abstract arrangements. We further show the method allows our agent to adapt to changes in the environment without requiring new expert examples.

...read moreread less

62 citations

Proceedings Article•DOI•

Commonsense mining as knowledge base completion? A study on the impact of novelty

[...]

Stanisław Jastrzębski, Dzmitry Bahdanau, Seyedarian Hosseini, Michael Noukhovitch, Yoshua Bengio, Jackie Chi Kit Cheung - Show less +2 more

01 Jun 2018

TL;DR: This paper proposed novelty of predicted triples with respect to the training set as an important factor in interpreting results, and critically analyzed the difficulty of mining novel commonsense knowledge, and showed that a simple baseline method that outperforms the previous state of the art on predicting more novel triples.

...read moreread less

Abstract: Commonsense knowledge bases such as ConceptNet represent knowledge in the form of relational triples. Inspired by recent work by Li et al., we analyse if knowledge base completion models can be used to mine commonsense knowledge from raw text. We propose novelty of predicted triples with respect to the training set as an important factor in interpreting results. We critically analyse the difficulty of mining novel commonsense knowledge, and show that a simple baseline method that outperforms the previous state of the art on predicting more novel triples.

...read moreread less

27 citations

Posted Content•

BabyAI: A Platform to Study the Sample Efficiency of Grounded Language Learning

[...]

Maxime Chevalier-Boisvert¹, Dzmitry Bahdanau¹, Salem Lahlou, Lucas Willems, Chitwan Saharia², Thien Huu Nguyen³, Yoshua Bengio¹ - Show less +3 more•Institutions (3)

Université de Montréal¹, Indian Institute of Technology Bombay², University of Oregon³

18 Oct 2018-arXiv: Artificial Intelligence

TL;DR: The BabyAI research platform is introduced to support investigations towards including humans in the loop for grounded language learning and puts forward strong evidence that current deep learning methods are not yet sufficiently sample efficient when it comes to learning a language with compositional properties.

...read moreread less

Abstract: Allowing humans to interactively train artificial agents to understand language instructions is desirable for both practical and scientific reasons, but given the poor data efficiency of the current learning methods, this goal may require substantial research efforts. Here, we introduce the BabyAI research platform to support investigations towards including humans in the loop for grounded language learning. The BabyAI platform comprises an extensible suite of 19 levels of increasing difficulty. The levels gradually lead the agent towards acquiring a combinatorially rich synthetic language which is a proper subset of English. The platform also provides a heuristic expert agent for the purpose of simulating a human teacher. We report baseline results and estimate the amount of human involvement that would be required to train a neural network-based agent on some of the BabyAI levels. We put forward strong evidence that current deep learning methods are not yet sufficiently sample efficient when it comes to learning a language with compositional properties.

...read moreread less

25 citations

Posted Content•

Learning to Follow Language Instructions with Adversarial Reward Induction

[...]

Dzmitry Bahdanau, Felix Hill, Jan Leike, Edward Hughes, Pushmeet Kohli, Edward Grefenstette - Show less +2 more

05 Jun 2018

TL;DR: This work presents a method for learning to follow commands from a training set of instructions and corresponding example goal-states, rather than an explicit reward function, and shows the method allows an agent to adapt to changes in the environment without requiring new training examples.

...read moreread less

Abstract: Recent work has shown that deep reinforcement-learning agents can learn to follow language-like instructions from infrequent environment rewards. However, for many real-world natural language commands that involve a degree of underspecification or ambiguity, such as tidy the room, it would be challenging or impossible to program an appropriate reward function. To overcome this, we present a method for learning to follow commands from a training set of instructions and corresponding example goal-states, rather than an explicit reward function. Importantly, the example goal-states are not seen at test time. The approach effectively separates the representation of what instructions require from how they can be executed. In a simple grid world, the method enables an agent to learn a range of commands requiring interaction with blocks and understanding of spatial relations and underspecified abstract arrangements. We further show the method allows our agent to adapt to changes in the environment without requiring new training examples.

...read moreread less

22 citations

Posted Content•

Commonsense mining as knowledge base completion? A study on the impact of novelty

[...]

Stanisław Jastrzębski, Dzmitry Bahdanau, Seyedarian Hosseini, Michael Noukhovitch, Yoshua Bengio, Jackie Chi Kit Cheung - Show less +2 more

24 Apr 2018-arXiv: Computation and Language

TL;DR: It is shown that a simple baseline method outperforms the previous state of the art on predicting more novel commonsense knowledge and novelty of predicted triples with respect to the training set as an important factor in interpreting results.

...read moreread less

Abstract: Commonsense knowledge bases such as ConceptNet represent knowledge in the form of relational triples. Inspired by the recent work by Li et al., we analyse if knowledge base completion models can be used to mine commonsense knowledge from raw text. We propose novelty of predicted triples with respect to the training set as an important factor in interpreting results. We critically analyse the difficulty of mining novel commonsense knowledge, and show that a simple baseline method outperforms the previous state of the art on predicting more novel.

...read moreread less

14 citations

Proceedings Article•

Jointly Learning "What" and "How" from Instructions and Goal-States.

[...]