scispace - formally typeset
Open AccessJournal Article

Exploring the Limits of Transfer Learning with a Unified Text-to-Text Transformer

TLDR
This article introduced a unified framework that converts all text-based language problems into a text-to-text format and compared pre-training objectives, architectures, unlabeled data sets, transfer approaches, and other factors on dozens of language understanding tasks.
Abstract
Transfer learning, where a model is first pre-trained on a data-rich task before being fine-tuned on a downstream task, has emerged as a powerful technique in natural language processing (NLP). The effectiveness of transfer learning has given rise to a diversity of approaches, methodology, and practice. In this paper, we explore the landscape of transfer learning techniques for NLP by introducing a unified framework that converts all text-based language problems into a text-to-text format. Our systematic study compares pre-training objectives, architectures, unlabeled data sets, transfer approaches, and other factors on dozens of language understanding tasks. By combining the insights from our exploration with scale and our new ``Colossal Clean Crawled Corpus'', we achieve state-of-the-art results on many benchmarks covering summarization, question answering, text classification, and more. To facilitate future work on transfer learning for NLP, we release our data set, pre-trained models, and code.

read more

Content maybe subject to copyright    Report

Citations
More filters
Posted Content

How Optimal is Greedy Decoding for Extractive Question Answering

TL;DR: The authors showed that greedy decoding quickly converges towards the performance of exact extractive decoding with the introduction of a few training examples, becoming more extractive and increasingly likelier to generate the most probable span as the training set grows.
Proceedings ArticleDOI

BERT meets Cranfield: Uncovering the Properties of Full Ranking on Fully Labeled Data

TL;DR: In this article, the authors investigate BERT-based rankers performance on the Cranfield collection, which comes with full relevance judgment on all documents in the collection, as opposed to the BERTbased re-ranker and BM25.
Posted ContentDOI

Sequence-to-Sequence Piano Transcription with Transformers

TL;DR: In this paper, a generic encoder-decoder Transformer with standard decoding methods is used to translate spectrogram inputs directly to MIDI-like output events for several transcription tasks.
Proceedings ArticleDOI

Disentangled Sequence to Sequence Learning for Compositional Generalization

TL;DR: The authors proposed an extension to sequence-to-sequence models which encourages disentanglement by adaptively reencoding (at each time step) the source input, which makes it easier for the encoder to exploit specialized information for each prediction rather than capturing it all in a single forward pass.
Related Papers (5)
Trending Questions (1)
What are the limitations of transfer learning with a unified text-to-text transformer?

The paper does not mention the limitations of transfer learning with a unified text-to-text transformer.