scispace - formally typeset
Open AccessJournal Article

Exploring the Limits of Transfer Learning with a Unified Text-to-Text Transformer

Reads0
Chats0
TLDR
This article introduced a unified framework that converts all text-based language problems into a text-to-text format and compared pre-training objectives, architectures, unlabeled data sets, transfer approaches, and other factors on dozens of language understanding tasks.
Abstract
Transfer learning, where a model is first pre-trained on a data-rich task before being fine-tuned on a downstream task, has emerged as a powerful technique in natural language processing (NLP). The effectiveness of transfer learning has given rise to a diversity of approaches, methodology, and practice. In this paper, we explore the landscape of transfer learning techniques for NLP by introducing a unified framework that converts all text-based language problems into a text-to-text format. Our systematic study compares pre-training objectives, architectures, unlabeled data sets, transfer approaches, and other factors on dozens of language understanding tasks. By combining the insights from our exploration with scale and our new ``Colossal Clean Crawled Corpus'', we achieve state-of-the-art results on many benchmarks covering summarization, question answering, text classification, and more. To facilitate future work on transfer learning for NLP, we release our data set, pre-trained models, and code.

read more

Content maybe subject to copyright    Report

Citations
More filters
Posted Content

Building Compact and Robust Deep Neural Networks with Toeplitz Matrices

Alexandre Araujo
- 02 Sep 2021 - 
TL;DR: In this article, the authors leverage the properties of structured matrices from the Toeplitz family to build compact and secure neural networks, which are not only accurate but also compact, easy to train, reliable and robust to adversarial examples.
Posted Content

Conditional Generation of Temporally-ordered Event Sequences

TL;DR: The authors use a denoising autoencoder to predict new events which fit into an existing temporally-ordered sequence, which can capture both temporality and common event co-occurrence.
Posted Content

Pretrained Language Models are Symbolic Mathematics Solvers too

TL;DR: This article proposed a sample efficient way of solving the symbolic tasks by first pretraining the transformer model with language translation and then fine-tuning the pretrained transformer model to solve the downstream task of symbolic mathematics.
Posted Content

Cross-Domain Reasoning via Template Filling.

TL;DR: This article explore the ability of sequence to sequence models to perform cross-domain reasoning and present a prompt-template-filling approach to enable sequence-to-sequence models to do crossdomain reasoning.
Proceedings Article

PermuteFormer: Efficient Relative Position Encoding for Long Sequences.

TL;DR: Permuteformer as discussed by the authors applies position-dependent transformation on queries and keys to encode positional information into the attention module, which is carefully crafted so that the final output of self-attention is not affected by absolute positions of tokens.
Related Papers (5)
Trending Questions (1)
What are the limitations of transfer learning with a unified text-to-text transformer?

The paper does not mention the limitations of transfer learning with a unified text-to-text transformer.