Showing papers by "Dzmitry Bahdanau published in 2021"

PDF

Open Access

Proceedings Article•DOI•

Understanding by Understanding Not: Modeling Negation in Language Models.

[...]

Arian Hosseini, Siva Reddy¹, Dzmitry Bahdanau², R Devon Hjelm³, Alessandro Sordoni³, Aaron Courville⁴ - Show less +2 more•Institutions (4)

University of Copenhagen¹, McGill University², Microsoft³, Université de Montréal⁴

07 May 2021

TL;DR: By training BERT with the resulting combined objective of an unlikelihood objective that is based on negated generic sentences from a raw text corpus, this work reduces the mean top 1 error rate to 4% on the negated LAMA dataset.

...read moreread less

Abstract: Negation is a core construction in natural language. Despite being very successful on many tasks, state-of-the-art pre-trained language models often handle negation incorrectly. To improve language models in this regard, we propose to augment the language modeling objective with an unlikelihood objective that is based on negated generic sentences from a raw text corpus. By training BERT with the resulting combined objective we reduce the mean top 1 error rate to 4% on the negated LAMA dataset. We also see some improvements on the negated NLI benchmarks.

...read moreread less

41 citations

Proceedings Article•

PICARD: Parsing Incrementally for Constrained Auto-Regressive Decoding from Language Models

[...]

Torsten Scholak¹, Nathan Schucher, Dzmitry Bahdanau²•Institutions (2)

University of Toronto¹, McGill University²

10 Sep 2021

TL;DR: PICARD as discussed by the authors constrains auto-regressive decoders of language models through incremental parsing to find valid output sequences by rejecting inadmissible tokens at each decoding step.

...read moreread less

Abstract: Large pre-trained language models for textual data have an unconstrained output space; at each decoding step, they can produce any of 10,000s of sub-word tokens. When fine-tuned to target constrained formal languages like SQL, these models often generate invalid code, rendering it unusable. We propose PICARD (code available at https://github.com/ElementAI/picard), a method for constraining auto-regressive decoders of language models through incremental parsing. PICARD helps to find valid output sequences by rejecting inadmissible tokens at each decoding step. On the challenging Spider and CoSQL text-to-SQL translation tasks, we show that PICARD transforms fine-tuned T5 models with passable performance into state-of-the-art solutions.

...read moreread less

22 citations

Proceedings Article•DOI•

DuoRAT: Towards Simpler Text-to-SQL Models

[...]

Torsten Scholak¹, Raymond Li², Dzmitry Bahdanau³, Harm de Vries⁴, Chris Pal⁵ - Show less +1 more•Institutions (5)

University of Toronto¹, University of British Columbia², McGill University³, Université de Montréal⁴, École Polytechnique de Montréal⁵

01 Jun 2021

TL;DR: DuoRAT as mentioned in this paper is a re-implementation of the state-of-the-art RAT-SQL model that uses only relation-aware or vanilla transformers as the building blocks.

...read moreread less

Abstract: Recent neural text-to-SQL models can effectively translate natural language questions to corresponding SQL queries on unseen databases. Working mostly on the Spider dataset, researchers have proposed increasingly sophisticated solutions to the problem. Contrary to this trend, in this paper we focus on simplifications. We begin by building DuoRAT, a re-implementation of the state-of-the-art RAT-SQL model that unlike RAT-SQL is using only relation-aware or vanilla transformers as the building blocks. We perform several ablation experiments using DuoRAT as the baseline model. Our experiments confirm the usefulness of some techniques and point out the redundancy of others, including structural SQL features and features that link the question with the schema.

...read moreread less

16 citations

Posted Content•

Compositional Generalization in Dependency Parsing

[...]

Emily Goodwin¹, Siva Reddy, Timothy J. O'Donnell, Dzmitry Bahdanau•Institutions (1)

McGill University¹

13 Oct 2021-arXiv: Computation and Language

TL;DR: This article introduced a set of dependency parses for CFQ, and used this to analyze the behavior of a state-of-the-art dependency parser (Qi et al., 2020) on the CFQ dataset.

...read moreread less

Abstract: Compositionality, or the ability to combine familiar units like words into novel phrases and sentences, has been the focus of intense interest in artificial intelligence in recent years. To test compositional generalization in semantic parsing, Keysers et al. (2020) introduced Compositional Freebase Queries (CFQ). This dataset maximizes the similarity between the test and train distributions over primitive units, like words, while maximizing the compound divergence: the dissimilarity between test and train distributions over larger structures, like phrases. Dependency parsing, however, lacks a compositional generalization benchmark. In this work, we introduce a gold-standard set of dependency parses for CFQ, and use this to analyze the behavior of a state-of-the art dependency parser (Qi et al., 2020) on the CFQ dataset. We find that increasing compound divergence degrades dependency parsing performance, although not as dramatically as semantic parsing performance. Additionally, we find the performance of the dependency parser does not uniformly degrade relative to compound divergence, and the parser performs differently on different splits with the same compound divergence. We explore a number of hypotheses for what causes the non-uniform degradation in dependency parsing performance, and identify a number of syntactic structures that drive the dependency parser's lower performance on the most challenging splits.

...read moreread less

1 citations

Posted Content•

Understanding by Understanding Not: Modeling Negation in Language Models

[...]

Arian Hosseini, Siva Reddy¹, Dzmitry Bahdanau², R Devon Hjelm³, Alessandro Sordoni³, Aaron Courville⁴ - Show less +2 more•Institutions (4)

University of Copenhagen¹, McGill University², Microsoft³, Université de Montréal⁴

07 May 2021-arXiv: Computation and Language

TL;DR: This paper proposed to augment the language modeling objective with an unlikelihood objective that is based on negated generic sentences from a raw text corpus, which reduced the mean top-1 error rate to 4% on the negated LAMA dataset.

...read moreread less

Abstract: Negation is a core construction in natural language. Despite being very successful on many tasks, state-of-the-art pre-trained language models often handle negation incorrectly. To improve language models in this regard, we propose to augment the language modeling objective with an unlikelihood objective that is based on negated generic sentences from a raw text corpus. By training BERT with the resulting combined objective we reduce the mean top~1 error rate to 4% on the negated LAMA dataset. We also see some improvements on the negated NLI benchmarks.

...read moreread less

1 citations

Posted Content•

PICARD: Parsing Incrementally for Constrained Auto-Regressive Decoding from Language Models

[...]

Torsten Scholak¹, Nathan Schucher, Dzmitry Bahdanau²•Institutions (2)

University of Toronto¹, McGill University²

10 Sep 2021-arXiv: Computation and Language

TL;DR: PICARD as mentioned in this paper constrains auto-regressive decoders of language models through incremental parsing, which helps to find valid output sequences by rejecting inadmissible tokens at each decoding step.

...read moreread less

Abstract: Large pre-trained language models for textual data have an unconstrained output space; at each decoding step, they can produce any of 10,000s of sub-word tokens. When fine-tuned to target constrained formal languages like SQL, these models often generate invalid code, rendering it unusable. We propose PICARD (code and trained models available at this https URL), a method for constraining auto-regressive decoders of language models through incremental parsing. PICARD helps to find valid output sequences by rejecting inadmissible tokens at each decoding step. On the challenging Spider and CoSQL text-to-SQL translation tasks, we show that PICARD transforms fine-tuned T5 models with passable performance into state-of-the-art solutions.

...read moreread less

Posted Content•

LAGr: Labeling Aligned Graphs for Improving Systematic Generalization in Semantic Parsing.

[...]

Dora Jambor¹, Dzmitry Bahdanau•Institutions (1)

McGill University¹

14 Oct 2021-arXiv: Computation and Language

TL;DR: The Labeling Aligned Graphs (LAGr) algorithm as discussed by the authors produces semantic parses by predicting node and edge labels for a complete multi-layer input-aligned graph.

...read moreread less

Abstract: Semantic parsing is the task of producing a structured meaning representation for natural language utterances or questions. Recent research has pointed out that the commonly-used sequence-to-sequence (seq2seq) semantic parsers struggle to generalize systematically, i.e. to handle examples that require recombining known knowledge in novel settings. In this work, we show that better systematic generalization can be achieved by producing the meaning representation (MR) directly as a graph and not as a sequence. To this end we propose LAGr, the Labeling Aligned Graphs algorithm that produces semantic parses by predicting node and edge labels for a complete multi-layer input-aligned graph. The strongly-supervised LAGr algorithm requires aligned graphs as inputs, whereas weakly-supervised LAGr infers alignments for originally unaligned target graphs using an approximate MAP inference procedure. On the COGS and CFQ compositional generalization benchmarks the strongly- and weakly- supervised LAGr algorithms achieve significant improvements upon the baseline seq2seq parsers.

...read moreread less

Proceedings Article•DOI•

Combating False Negatives in Adversarial Imitation Learning

[...]

Konrad Zolna¹, Chitwan Saharia², Leonard Boussioux³, David Yu-Tung Hui, Maxime Chevalier-Boisvert, Dzmitry Bahdanau⁴, Yoshua Bengio⁵ - Show less +3 more•Institutions (5)

Jagiellonian University¹, Indian Institute of Technology Bombay², CentraleSupélec³, McGill University⁴, Canadian Institute for Advanced Research⁵

18 Jul 2021

TL;DR: In this article, a discriminator is trained to differentiate agent episodes from expert demonstrations representing the desired behavior, and the negative examples (the ones produced by the agent) become increasingly similar to expert ones.

...read moreread less

Abstract: In adversarial imitation learning, a discriminator is trained to differentiate agent episodes from expert demonstrations representing the desired behavior. However, as the trained policy learns to be more successful, the negative examples (the ones produced by the agent) become increasingly similar to expert ones. Despite the fact that the task is successfully accomplished in some of the agent's trajectories, the discriminator is trained to output low values for them. We hypothesize that this inconsistent training signal for the discriminator can impede its learning, and consequently leads to worse overall performance of the agent. We show experimental evidence for this hypothesis and that the ‘False Negatives’ (i.e. successful agent episodes) significantly hinder adversarial imitation learning, which is the first contribution of this paper. Then, we propose a method to alleviate the impact of false negatives and test it on the BabyAI environment. This method consistently improves sample efficiency over the baselines by at least an order of magnitude.

...read moreread less

Posted Content•

Jointly Learning Truth-Conditional Denotations and Groundings using Parallel Attention.

[...]

Leon Bergen¹, Dzmitry Bahdanau, Timothy J. O'Donnell²•Institutions (2)

University of California, San Diego¹, McGill University²

14 Apr 2021-arXiv: Computation and Language

TL;DR: The authors proposed a model that jointly learns the denotations of words together with their groundings using a truth-conditional semantics and achieved state-of-the-art performance on visual question answering.

...read moreread less

Abstract: We present a model that jointly learns the denotations of words together with their groundings using a truth-conditional semantics. Our model builds on the neurosymbolic approach of Mao et al. (2019), learning to ground objects in the CLEVR dataset (Johnson et al., 2017) using a novel parallel attention mechanism. The model achieves state of the art performance on visual question answering, learning to detect and ground objects with question performance as the only training signal. We also show that the model is able to learn flexible non-canonical groundings just by adjusting answers to questions in the training set.

...read moreread less

Posted Content•

Systematic Generalization with Edge Transformers

[...]

Leon Bergen¹, Timothy J. O'Donnell², Dzmitry Bahdanau²•Institutions (2)

University of California, San Diego¹, McGill University²

01 Dec 2021-arXiv: Computation and Language

TL;DR: In this article, the authors propose Edge Transformers, a new model that combines inspiration from Transformers and rule-based symbolic AI to tackle the challenge of systematic generalization in natural language understanding.

...read moreread less

Abstract: Recent research suggests that systematic generalization in natural language understanding remains a challenge for state-of-the-art neural models such as Transformers and Graph Neural Networks. To tackle this challenge, we propose Edge Transformer, a new model that combines inspiration from Transformers and rule-based symbolic AI. The first key idea in Edge Transformers is to associate vector states with every edge, that is, with every pair of input nodes -- as opposed to just every node, as it is done in the Transformer model. The second major innovation is a triangular attention mechanism that updates edge representations in a way that is inspired by unification from logic programming. We evaluate Edge Transformer on compositional generalization benchmarks in relational reasoning, semantic parsing, and dependency parsing. In all three settings, the Edge Transformer outperforms Relation-aware, Universal and classical Transformer baselines.

...read moreread less