Exploring the Limits of Transfer Learning with a Unified Text-to-Text Transformer

Open AccessJournal Article

Exploring the Limits of Transfer Learning with a Unified Text-to-Text Transformer

Colin Raffel, +8 more

- 01 Jan 2020 -

Journal of Machine Learning Research

- Vol. 21, Iss: 140, pp 1-67

Chats0

TLDR

This article introduced a unified framework that converts all text-based language problems into a text-to-text format and compared pre-training objectives, architectures, unlabeled data sets, transfer approaches, and other factors on dozens of language understanding tasks.

Abstract:

Transfer learning, where a model is first pre-trained on a data-rich task before being fine-tuned on a downstream task, has emerged as a powerful technique in natural language processing (NLP). The effectiveness of transfer learning has given rise to a diversity of approaches, methodology, and practice. In this paper, we explore the landscape of transfer learning techniques for NLP by introducing a unified framework that converts all text-based language problems into a text-to-text format. Our systematic study compares pre-training objectives, architectures, unlabeled data sets, transfer approaches, and other factors on dozens of language understanding tasks. By combining the insights from our exploration with scale and our new ``Colossal Clean Crawled Corpus'', we achieve state-of-the-art results on many benchmarks covering summarization, question answering, text classification, and more. To facilitate future work on transfer learning for NLP, we release our data set, pre-trained models, and code.

Citations

PDF

Open Access

More filters

Posted Content

Building Compact and Robust Deep Neural Networks with Toeplitz Matrices

Alexandre Araujo

- 02 Sep 2021 -

arXiv: Learning

TL;DR: In this article, the authors leverage the properties of structured matrices from the Toeplitz family to build compact and secure neural networks, which are not only accurate but also compact, easy to train, reliable and robust to adversarial examples.

...read moreread less

Posted Content

Conditional Generation of Temporally-ordered Event Sequences

Shih-Ting Lin, +2 more

- 31 Dec 2020 -

arXiv: Computation and Language

TL;DR: The authors use a denoising autoencoder to predict new events which fit into an existing temporally-ordered sequence, which can capture both temporality and common event co-occurrence.

...read moreread less

Posted Content

Pretrained Language Models are Symbolic Mathematics Solvers too

Kimia Noorbakhsh, +4 more

- 07 Oct 2021 -

arXiv: Machine Learning

TL;DR: This article proposed a sample efficient way of solving the symbolic tasks by first pretraining the transformer model with language translation and then fine-tuning the pretrained transformer model to solve the downstream task of symbolic mathematics.

...read moreread less

Posted Content

Cross-Domain Reasoning via Template Filling.

Dheeraj Rajagopal, +5 more

- 31 Oct 2021 -

arXiv: Computation and Language

TL;DR: This article explore the ability of sequence to sequence models to perform cross-domain reasoning and present a prompt-template-filling approach to enable sequence-to-sequence models to do crossdomain reasoning.

...read moreread less

Proceedings Article

PermuteFormer: Efficient Relative Position Encoding for Long Sequences.

Peng Chen

TL;DR: Permuteformer as discussed by the authors applies position-dependent transformation on queries and keys to encode positional information into the attention module, which is carefully crafted so that the final output of self-attention is not affected by absolute positions of tokens.

...read moreread less