BART: Denoising Sequence-to-Sequence Pre-training for Natural Language Generation, Translation, and Comprehension.

Open AccessPosted Content

BART: Denoising Sequence-to-Sequence Pre-training for Natural Language Generation, Translation, and Comprehension.

Michael Lewis, +7 more

- 29 Oct 2019 -

arXiv: Computation and Language

Chats0

TLDR

BART as mentioned in this paper is a denoising autoencoder for pretraining sequence-to-sequence models, which is trained by corrupting text with an arbitrary noising function, and then learning a model to reconstruct the original text.

Abstract:

We present BART, a denoising autoencoder for pretraining sequence-to-sequence models. BART is trained by (1) corrupting text with an arbitrary noising function, and (2) learning a model to reconstruct the original text. It uses a standard Tranformer-based neural machine translation architecture which, despite its simplicity, can be seen as generalizing BERT (due to the bidirectional encoder), GPT (with the left-to-right decoder), and many other more recent pretraining schemes. We evaluate a number of noising approaches, finding the best performance by both randomly shuffling the order of the original sentences and using a novel in-filling scheme, where spans of text are replaced with a single mask token. BART is particularly effective when fine tuned for text generation but also works well for comprehension tasks. It matches the performance of RoBERTa with comparable training resources on GLUE and SQuAD, achieves new state-of-the-art results on a range of abstractive dialogue, question answering, and summarization tasks, with gains of up to 6 ROUGE. BART also provides a 1.1 BLEU increase over a back-translation system for machine translation, with only target language pretraining. We also report ablation experiments that replicate other pretraining schemes within the BART framework, to better measure which factors most influence end-task performance.

Citations

PDF

Open Access

More filters

Posted Content

Longformer: The Long-Document Transformer

Iz Beltagy, +2 more

- 10 Apr 2020 -

arXiv: Computation and Language

TL;DR: Following prior work on long-sequence transformers, the Longformer is evaluated on character-level language modeling and achieves state-of-the-art results on text8 and enwik8 and pretrain Longformer and finetune it on a variety of downstream tasks.

...read moreread less

Posted Content

Dense Passage Retrieval for Open-Domain Question Answering

Vladimir Karpukhin, +7 more

- 10 Apr 2020 -

arXiv: Computation and Language

TL;DR: This work shows that retrieval can be practically implemented using dense representations alone, where embeddings are learned from a small number of questions and passages by a simple dual-encoder framework.

...read moreread less

Journal ArticleDOI

Multilingual Denoising Pre-training for Neural Machine Translation

Yinhan Liu, +7 more

- 27 Nov 2020 -

Transactions of the Association for Comp...

TL;DR: This article proposed mBART, a sequence-to-sequence denoising auto-encoder pre-trained on large-scale monolingual corpora in many languages using the BART objective.

...read moreread less

Posted Content

CodeBERT: A Pre-Trained Model for Programming and Natural Languages

Zhangyin Feng, +10 more

- 19 Feb 2020 -

arXiv: Computation and Language

TL;DR: This work develops CodeBERT with Transformer-based neural architecture, and trains it with a hybrid objective function that incorporates the pre-training task of replaced token detection, which is to detect plausible alternatives sampled from generators.

...read moreread less

Posted Content

PEGASUS: Pre-training with Extracted Gap-sentences for Abstractive Summarization

Jingqing Zhang, +3 more

- 18 Dec 2019 -

arXiv: Computation and Language

TL;DR: This work proposes pre-training large Transformer-based encoder-decoder models on massive text corpora with a new self-supervised objective, PEGASUS, and demonstrates it achieves state-of-the-art performance on all 12 downstream datasets measured by ROUGE scores.

...read moreread less

Collapse

References

PDF

Open Access

More filters

Proceedings Article

Attention is All you Need

Ashish Vaswani, +7 more

TL;DR: This paper proposed a simple network architecture based solely on an attention mechanism, dispensing with recurrence and convolutions entirely and achieved state-of-the-art performance on English-to-French translation.

...read moreread less

Proceedings ArticleDOI

BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding

Jacob Devlin, +3 more

TL;DR: BERT as mentioned in this paper pre-trains deep bidirectional representations from unlabeled text by jointly conditioning on both left and right context in all layers, which can be fine-tuned with just one additional output layer to create state-of-the-art models for a wide range of tasks.

...read moreread less

Posted Content

Efficient Estimation of Word Representations in Vector Space

Tomas Mikolov, +3 more

- 16 Jan 2013 -

arXiv: Computation and Language

TL;DR: This paper proposed two novel model architectures for computing continuous vector representations of words from very large data sets, and the quality of these representations is measured in a word similarity task and the results are compared to the previously best performing techniques based on different types of neural networks.

...read moreread less

Posted Content

RoBERTa: A Robustly Optimized BERT Pretraining Approach

Yinhan Liu, +9 more

- 26 Jul 2019 -

arXiv: Computation and Language

TL;DR: It is found that BERT was significantly undertrained, and can match or exceed the performance of every model published after it, and the best model achieves state-of-the-art results on GLUE, RACE and SQuAD.

...read moreread less

Proceedings Article

Recursive Deep Models for Semantic Compositionality Over a Sentiment Treebank

Richard Socher, +6 more

TL;DR: A Sentiment Treebank that includes fine grained sentiment labels for 215,154 phrases in the parse trees of 11,855 sentences and presents new challenges for sentiment compositionality, and introduces the Recursive Neural Tensor Network.

...read moreread less

Collapse

arXiv: Computation and Language

Bleu: a Method for Automatic Evaluation of Machine Translation

Kishore Papineni, +3 more

ROUGE: A Package for Automatic Evaluation of Summaries

Chin-Yew Lin

BART: Denoising Sequence-to-Sequence Pre-training for Natural Language Generation, Translation, and Comprehension.

Citations

Longformer: The Long-Document Transformer

Dense Passage Retrieval for Open-Domain Question Answering

Multilingual Denoising Pre-training for Neural Machine Translation

CodeBERT: A Pre-Trained Model for Programming and Natural Languages

PEGASUS: Pre-training with Extracted Gap-sentences for Abstractive Summarization

References

Attention is All you Need

BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding

Efficient Estimation of Word Representations in Vector Space

RoBERTa: A Robustly Optimized BERT Pretraining Approach

Recursive Deep Models for Semantic Compositionality Over a Sentiment Treebank

Related Papers (5)

BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding

Attention is All you Need

RoBERTa: A Robustly Optimized BERT Pretraining Approach

Bleu: a Method for Automatic Evaluation of Machine Translation

ROUGE: A Package for Automatic Evaluation of Summaries