Exploring the Limits of Transfer Learning with a Unified Text-to-Text Transformer

Open AccessJournal Article

Exploring the Limits of Transfer Learning with a Unified Text-to-Text Transformer

Colin Raffel, +8 more

- 01 Jan 2020 -

Journal of Machine Learning Research

- Vol. 21, Iss: 140, pp 1-67

Chats0

TLDR

This article introduced a unified framework that converts all text-based language problems into a text-to-text format and compared pre-training objectives, architectures, unlabeled data sets, transfer approaches, and other factors on dozens of language understanding tasks.

Abstract:

Transfer learning, where a model is first pre-trained on a data-rich task before being fine-tuned on a downstream task, has emerged as a powerful technique in natural language processing (NLP). The effectiveness of transfer learning has given rise to a diversity of approaches, methodology, and practice. In this paper, we explore the landscape of transfer learning techniques for NLP by introducing a unified framework that converts all text-based language problems into a text-to-text format. Our systematic study compares pre-training objectives, architectures, unlabeled data sets, transfer approaches, and other factors on dozens of language understanding tasks. By combining the insights from our exploration with scale and our new ``Colossal Clean Crawled Corpus'', we achieve state-of-the-art results on many benchmarks covering summarization, question answering, text classification, and more. To facilitate future work on transfer learning for NLP, we release our data set, pre-trained models, and code.

Citations

PDF

Open Access

More filters

Posted Content

Relation-Guided Pre-Training for Open-Domain Question Answering.

Ziniu Hu, +2 more

- 21 Sep 2021 -

arXiv: Computation and Language

TL;DR: Zhang et al. as mentioned in this paper proposed a relation-guided pre-training (RGPT-QA) framework to improve the generalization performance over questions with long-tail relations.

...read moreread less

Posted Content

Revisiting Linformer with a modified self-attention with linear complexity.

Madhusudan Verma

- 01 Jan 2021 -

arXiv: Learning

TL;DR: In this article, the authors proposed an alternative method for self-attention with linear complexity in time and space and is independent of the projection mapping dimension, which can be used for images as well as audios.

...read moreread less

Posted Content

COPER: a Query-adaptable Semantics-based Search Engine for Persian COVID-19 Articles

Reza Khanmohammadi, +2 more

- 12 Jul 2021 -

arXiv: Information Retrieval

TL;DR: In this paper, a search engine consisting of a ranker and a re-ranker was used to sift through these documents and rank them, given a user's query.

...read moreread less

Posted Content

Analysis and Evaluation of Language Models for Word Sense Disambiguation

Daniel Loureiro, +3 more

- 26 Aug 2020 -

arXiv: Computation and Language

TL;DR: This paper provided an in-depth quantitative and qualitative analysis of the celebrated BERT model with respect to lexical ambiguity and found that BERT can accurately capture high-level sense distinctions, even when a limited number of examples is available for each word sense.

...read moreread less