BART: Denoising Sequence-to-Sequence Pre-training for Natural Language Generation, Translation, and Comprehension

doi:10.18653/V1/2020.ACL-MAIN.703

Open AccessProceedings ArticleDOI

BART: Denoising Sequence-to-Sequence Pre-training for Natural Language Generation, Translation, and Comprehension

Michael Lewis, +7 more

- pp 7871-7880

Chats0

TLDR

BART is presented, a denoising autoencoder for pretraining sequence-to-sequence models, which matches the performance of RoBERTa on GLUE and SQuAD, and achieves new state-of-the-art results on a range of abstractive dialogue, question answering, and summarization tasks.

Abstract:

We present BART, a denoising autoencoder for pretraining sequence-to-sequence models. BART is trained by (1) corrupting text with an arbitrary noising function, and (2) learning a model to reconstruct the original text. It uses a standard Tranformer-based neural machine translation architecture which, despite its simplicity, can be seen as generalizing BERT (due to the bidirectional encoder), GPT (with the left-to-right decoder), and other recent pretraining schemes. We evaluate a number of noising approaches, finding the best performance by both randomly shuffling the order of sentences and using a novel in-filling scheme, where spans of text are replaced with a single mask token. BART is particularly effective when fine tuned for text generation but also works well for comprehension tasks. It matches the performance of RoBERTa on GLUE and SQuAD, and achieves new state-of-the-art results on a range of abstractive dialogue, question answering, and summarization tasks, with gains of up to 3.5 ROUGE. BART also provides a 1.1 BLEU increase over a back-translation system for machine translation, with only target language pretraining. We also replicate other pretraining schemes within the BART framework, to understand their effect on end-task performance.

Citations

PDF

Open Access

More filters

Posted Content

BET: A Backtranslation Approach for Easy Data Augmentation in Transformer-based Paraphrase Identification Context.

Jean-Philippe Corbeil, +1 more

- 25 Sep 2020 -

arXiv: Computation and Language

TL;DR: The results demonstrate that BET is a highly promising data augmentation technique: to push the current state-of-the-art of existing datasets and to bootstrap the utilization of deep learning architectures in the low-data regime of a hundred samples.

...read moreread less

Journal ArticleDOI

Rethinking Search: Making Domain Experts out of Dilettantes.

Donald Metzler, +3 more

- 05 May 2021 -

arXiv: Information Retrieval

TL;DR: The authors examines how ideas from classical information retrieval and pre-trained language models can be synthesized and evolved into systems that truly deliver on the promise of domain expert advice, but they do not have a true understanding of the world, they are prone to hallucinating, and they are incapable of justifying their utterances by referring to supporting documents in the corpus they were trained over.

...read moreread less

Posted Content

CAPT: Contrastive Pre-Training for Learning Denoised Sequence Representations.

Fuli Luo, +4 more

- 13 Oct 2020 -

arXiv: Computation and Language

TL;DR: Comprehensive empirical evidence on 11 natural language understanding and cross-modal tasks illustrates that CAPT is applicable for both language and vision-language tasks, and obtains surprisingly consistent improvement, including 0.6% absolute gain on GLUE benchmarks and 0.8% absolute increment on NLVR.

...read moreread less

Journal ArticleDOI

Pre-Trained Language Models and Their Applications

Haifeng Wang, +4 more

- 01 Sep 2022 -

Engineering

TL;DR: Pre-trained language models have achieved striking success in natural language processing (NLP), leading to a paradigm shift from supervised learning to pre-training followed by fine-tuning as mentioned in this paper .

...read moreread less

Journal ArticleDOI

E2S2: Encoding-Enhanced Sequence-to-Sequence Pretraining for Language Understanding and Generation

Qihuang Zhong, +4 more

- 30 May 2022 -

arXiv.org

TL;DR: E2S2 is proposed, which improves the seq2seq models via integrating more efﬁcient self-supervised information into the encoders and can consistently boost the performance, including 1.0% averaged gain on GLUE benchmark and 1.75% F 0 .

...read moreread less

Collapse

References

PDF

Open Access

More filters

Proceedings Article

Attention is All you Need

Ashish Vaswani, +7 more

TL;DR: This paper proposed a simple network architecture based solely on an attention mechanism, dispensing with recurrence and convolutions entirely and achieved state-of-the-art performance on English-to-French translation.

...read moreread less

Proceedings ArticleDOI

BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding

Jacob Devlin, +3 more

TL;DR: BERT as mentioned in this paper pre-trains deep bidirectional representations from unlabeled text by jointly conditioning on both left and right context in all layers, which can be fine-tuned with just one additional output layer to create state-of-the-art models for a wide range of tasks.

...read moreread less

Posted Content

Efficient Estimation of Word Representations in Vector Space

Tomas Mikolov, +3 more

- 16 Jan 2013 -

arXiv: Computation and Language

TL;DR: This paper proposed two novel model architectures for computing continuous vector representations of words from very large data sets, and the quality of these representations is measured in a word similarity task and the results are compared to the previously best performing techniques based on different types of neural networks.

...read moreread less

Posted Content

RoBERTa: A Robustly Optimized BERT Pretraining Approach

Yinhan Liu, +9 more

- 26 Jul 2019 -

arXiv: Computation and Language

TL;DR: It is found that BERT was significantly undertrained, and can match or exceed the performance of every model published after it, and the best model achieves state-of-the-art results on GLUE, RACE and SQuAD.

...read moreread less

Proceedings ArticleDOI

Deep contextualized word representations

Matthew E. Peters, +6 more

TL;DR: This paper introduced a new type of deep contextualized word representation that models both complex characteristics of word use (e.g., syntax and semantics), and how these uses vary across linguistic contexts (i.e., to model polysemy).

...read moreread less

Collapse

arXiv: Computation and Language

Bleu: a Method for Automatic Evaluation of Machine Translation

Kishore Papineni, +3 more

ROUGE: A Package for Automatic Evaluation of Summaries

Chin-Yew Lin

BART: Denoising Sequence-to-Sequence Pre-training for Natural Language Generation, Translation, and Comprehension

Citations

BET: A Backtranslation Approach for Easy Data Augmentation in Transformer-based Paraphrase Identification Context.

Rethinking Search: Making Domain Experts out of Dilettantes.

CAPT: Contrastive Pre-Training for Learning Denoised Sequence Representations.

Pre-Trained Language Models and Their Applications

E2S2: Encoding-Enhanced Sequence-to-Sequence Pretraining for Language Understanding and Generation

References

Attention is All you Need

BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding

Efficient Estimation of Word Representations in Vector Space

RoBERTa: A Robustly Optimized BERT Pretraining Approach

Deep contextualized word representations

Related Papers (5)

BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding

Attention is All you Need

RoBERTa: A Robustly Optimized BERT Pretraining Approach

Bleu: a Method for Automatic Evaluation of Machine Translation

ROUGE: A Package for Automatic Evaluation of Summaries

Trending Questions (1)