Exploring the Limits of Transfer Learning with a Unified Text-to-Text Transformer

Open AccessJournal Article

Exploring the Limits of Transfer Learning with a Unified Text-to-Text Transformer

Colin Raffel, +8 more

- 01 Jan 2020 -

Journal of Machine Learning Research

- Vol. 21, Iss: 140, pp 1-67

Chats0

TLDR

This article introduced a unified framework that converts all text-based language problems into a text-to-text format and compared pre-training objectives, architectures, unlabeled data sets, transfer approaches, and other factors on dozens of language understanding tasks.

Abstract:

Transfer learning, where a model is first pre-trained on a data-rich task before being fine-tuned on a downstream task, has emerged as a powerful technique in natural language processing (NLP). The effectiveness of transfer learning has given rise to a diversity of approaches, methodology, and practice. In this paper, we explore the landscape of transfer learning techniques for NLP by introducing a unified framework that converts all text-based language problems into a text-to-text format. Our systematic study compares pre-training objectives, architectures, unlabeled data sets, transfer approaches, and other factors on dozens of language understanding tasks. By combining the insights from our exploration with scale and our new ``Colossal Clean Crawled Corpus'', we achieve state-of-the-art results on many benchmarks covering summarization, question answering, text classification, and more. To facilitate future work on transfer learning for NLP, we release our data set, pre-trained models, and code.

Citations

PDF

Open Access

More filters

Posted Content

CIDER: Commonsense Inference for Dialogue Explanation and Reasoning

Deepanway Ghosal, +5 more

- 01 Jun 2021 -

arXiv: Computation and Language

TL;DR: The CIDER dataset as mentioned in this paper contains dyadic dialogue explanations in the form of implicit and explicit knowledge triplets inferred using contextual commonsense inference, which are categorized by the type of commonsense knowledge present.

...read moreread less

Posted Content

Semantic Relations and Deep Learning.

Vivi Nastase, +1 more

- 11 Sep 2020 -

arXiv: Computation and Language

TL;DR: A new Chapter 5 of the book discusses relation classification/extraction in the deep-learning paradigm which arose after the first edition appeared.

...read moreread less

Proceedings ArticleDOI

Keyword Augmentation via Generative Methods

Haoran Shi, +4 more

TL;DR: In this paper, the authors proposed a keyword augmentation method based on generative seq2seq model and trie-based search mechanism, which is able to generate high-quality keywords for any products or product lists.

...read moreread less

Book ChapterDOI

Using Presentation Slides and Adjacent Utterances for Post-editing of Speech Recognition Results for Meeting Recordings

Kentaro Kamiya, +3 more

TL;DR: This article proposed a method for automatically post-editing ASR results by using presentation slides that meeting participants use and utterances adjacent to a target utterance, which can be used for arbitrary speech recognition engines.

...read moreread less

Dialogue response generation via contrastive latent representation learning

Shuyang Dai, +3 more

TL;DR: This paper proposed an utterance-level contrastive learning model to encode predictive information in each context representation for its corresponding response, which can learn a representative latent space of the sentence distribution, making it hard to control the generation.

...read moreread less