Exploring the Limits of Transfer Learning with a Unified Text-to-Text Transformer

Open AccessJournal Article

Exploring the Limits of Transfer Learning with a Unified Text-to-Text Transformer

Colin Raffel, +8 more

- 01 Jan 2020 -

Journal of Machine Learning Research

- Vol. 21, Iss: 140, pp 1-67

TLDR

This article introduced a unified framework that converts all text-based language problems into a text-to-text format and compared pre-training objectives, architectures, unlabeled data sets, transfer approaches, and other factors on dozens of language understanding tasks.

Abstract:

Transfer learning, where a model is first pre-trained on a data-rich task before being fine-tuned on a downstream task, has emerged as a powerful technique in natural language processing (NLP). The effectiveness of transfer learning has given rise to a diversity of approaches, methodology, and practice. In this paper, we explore the landscape of transfer learning techniques for NLP by introducing a unified framework that converts all text-based language problems into a text-to-text format. Our systematic study compares pre-training objectives, architectures, unlabeled data sets, transfer approaches, and other factors on dozens of language understanding tasks. By combining the insights from our exploration with scale and our new ``Colossal Clean Crawled Corpus'', we achieve state-of-the-art results on many benchmarks covering summarization, question answering, text classification, and more. To facilitate future work on transfer learning for NLP, we release our data set, pre-trained models, and code.

Citations

PDF

Open Access

More filters

Posted Content

How Optimal is Greedy Decoding for Extractive Question Answering

Or Castel, +3 more

- 12 Aug 2021 -

arXiv: Computation and Language

TL;DR: The authors showed that greedy decoding quickly converges towards the performance of exact extractive decoding with the introduction of a few training examples, becoming more extractive and increasingly likelier to generate the most probable span as the training set grows.

...read moreread less

Proceedings ArticleDOI

BERT meets Cranfield: Uncovering the Properties of Full Ranking on Fully Labeled Data

Negin Ghasemi, +1 more

TL;DR: In this article, the authors investigate BERT-based rankers performance on the Cranfield collection, which comes with full relevance judgment on all documents in the collection, as opposed to the BERTbased re-ranker and BM25.

...read moreread less

Posted ContentDOI

Sequence-to-Sequence Piano Transcription with Transformers

Curtis Hawthorne, +4 more

- 19 Jul 2021 -

arXiv: Sound

TL;DR: In this paper, a generic encoder-decoder Transformer with standard decoding methods is used to translate spectrogram inputs directly to MIDI-like output events for several transcription tasks.

...read moreread less

Journal ArticleDOI

gobbli: A uniform interface to deep learning for text in Python

Jason Nance, +1 more

- 24 Jun 2021 -

The Journal of Open Source Software

Proceedings ArticleDOI

Disentangled Sequence to Sequence Learning for Compositional Generalization

TL;DR: The authors proposed an extension to sequence-to-sequence models which encourages disentanglement by adaptively reencoding (at each time step) the source input, which makes it easier for the encoder to exploit specialized information for each prediction rather than capturing it all in a single forward pass.

...read moreread less