Exploring the Limits of Transfer Learning with a Unified Text-to-Text Transformer

Open AccessJournal Article

Exploring the Limits of Transfer Learning with a Unified Text-to-Text Transformer

Colin Raffel, +8 more

- 01 Jan 2020 -

Journal of Machine Learning Research

- Vol. 21, Iss: 140, pp 1-67

TLDR

This article introduced a unified framework that converts all text-based language problems into a text-to-text format and compared pre-training objectives, architectures, unlabeled data sets, transfer approaches, and other factors on dozens of language understanding tasks.

Abstract:

Transfer learning, where a model is first pre-trained on a data-rich task before being fine-tuned on a downstream task, has emerged as a powerful technique in natural language processing (NLP). The effectiveness of transfer learning has given rise to a diversity of approaches, methodology, and practice. In this paper, we explore the landscape of transfer learning techniques for NLP by introducing a unified framework that converts all text-based language problems into a text-to-text format. Our systematic study compares pre-training objectives, architectures, unlabeled data sets, transfer approaches, and other factors on dozens of language understanding tasks. By combining the insights from our exploration with scale and our new ``Colossal Clean Crawled Corpus'', we achieve state-of-the-art results on many benchmarks covering summarization, question answering, text classification, and more. To facilitate future work on transfer learning for NLP, we release our data set, pre-trained models, and code.

Citations

PDF

Open Access

More filters

Proceedings ArticleDOI

NeuroLogic Decoding: (Un)supervised Neural Text Generation with Predicate Logic Constraints

Ximing Lu, +5 more

TL;DR: This work proposes NeuroLogic Decoding, a simple yet effective algorithm that enables neural language models – supervised or not – to generate fluent text while satisfying complex lexical constraints, and suggests the limit of large-scale neural networks for fine-grained controllable generation and the promise of inference-time algorithms.

...read moreread less

Posted Content

Carbon Emissions and Large Neural Network Training.

David A. Patterson, +8 more

- 21 Apr 2021 -

arXiv: Learning

TL;DR: In this article, the authors calculate the energy use and carbon footprint of several recent large models, including T5, Meena, GShard, Switch Transformer, and GPT-3, and refine earlier estimates for the neural architecture search that found evolved transformer.

...read moreread less

Posted Content

Contrastive Learning with Adversarial Perturbations for Conditional Text Generation.

Seanie Lee, +2 more

- 14 Dec 2020 -

arXiv: Computation and Language

TL;DR: This work proposes a principled method to generate positive and negative samples for contrastive learning of sequence-to-sequence models, and empirically shows that this method significantly improves the generalization of the seq2seq on three text generation tasks - machine translation, text summarization, and question generation.

...read moreread less

Proceedings Article

Deberta: decoding-enhanced bert with disentangled attention

Pengcheng He, +3 more

TL;DR: DeBERTa as discussed by the authors proposes a disentangled attention mechanism, where each word is represented using two vectors that encode its content and position, respectively, and the attention weights among words are computed by disentangling matrices on their contents and relative positions.

...read moreread less

Posted Content

Lawformer: A Pre-trained Language Model for Chinese Legal Long Documents

Chaojun Xiao, +4 more

- 09 May 2021 -

arXiv: Computation and Language

TL;DR: This paper releases the Longformer-based pretrained language model, named as Lawformer, for Chinese legal long documents understanding, and demonstrates that the model can achieve promising improvement on tasks with long documents as inputs.

...read moreread less