Exploring the Limits of Transfer Learning with a Unified Text-to-Text Transformer

Open AccessJournal Article

Exploring the Limits of Transfer Learning with a Unified Text-to-Text Transformer

Colin Raffel, +8 more

- 01 Jan 2020 -

Journal of Machine Learning Research

- Vol. 21, Iss: 140, pp 1-67

Chats0

TLDR

This article introduced a unified framework that converts all text-based language problems into a text-to-text format and compared pre-training objectives, architectures, unlabeled data sets, transfer approaches, and other factors on dozens of language understanding tasks.

Abstract:

Transfer learning, where a model is first pre-trained on a data-rich task before being fine-tuned on a downstream task, has emerged as a powerful technique in natural language processing (NLP). The effectiveness of transfer learning has given rise to a diversity of approaches, methodology, and practice. In this paper, we explore the landscape of transfer learning techniques for NLP by introducing a unified framework that converts all text-based language problems into a text-to-text format. Our systematic study compares pre-training objectives, architectures, unlabeled data sets, transfer approaches, and other factors on dozens of language understanding tasks. By combining the insights from our exploration with scale and our new ``Colossal Clean Crawled Corpus'', we achieve state-of-the-art results on many benchmarks covering summarization, question answering, text classification, and more. To facilitate future work on transfer learning for NLP, we release our data set, pre-trained models, and code.

Citations

PDF

Open Access

More filters

Posted Content

Beneath the Tip of the Iceberg: Current Challenges and New Directions in Sentiment Analysis Research

Soujanya Poria, +3 more

- 01 May 2020 -

arXiv: Computation and Language

TL;DR: Sentiment analysis as a field has come a long way since it was first introduced as a task nearly 20 years ago and it has widespread commercial applications in various domains like marketing, risk management, market research, and politics as discussed by the authors.

...read moreread less

Posted Content

Quality at a Glance: An Audit of Web-Crawled Multilingual Datasets

Isaac Caswell, +51 more

- 23 Mar 2021 -

arXiv: Computation and Language

TL;DR: In this paper, the authors manually audit the quality of 205 language-specific corpora released with five major public datasets (CCAligned, ParaCrawl, WikiMatrix, OSCAR, mC4) and audit the correctness of language codes in a sixth (JW300).

...read moreread less

Posted Content

LayoutLMv2: Multi-modal Pre-training for Visually-Rich Document Understanding

Yang Xu, +11 more

- 29 Dec 2020 -

arXiv: Computation and Language

TL;DR: The LayoutLMv2 as discussed by the authors pre-trained text, layout and image in a multi-modal framework, where new model architectures and pre-training tasks are leveraged.

...read moreread less

Proceedings ArticleDOI

Learning Span-Level Interactions for Aspect Sentiment Triplet Extraction

Lu Xu, +2 more

TL;DR: This paper proposed a dual-channel span pruning strategy by incorporating supervision from the Aspect Term Extraction (ATE) and Opinion Term Extraction (OTE) tasks, which not only improves computational efficiency but also distinguishes the opinion and target spans more properly.

...read moreread less

Posted Content

Neural Passage Retrieval with Improved Negative Contrast.

Jing Lu, +4 more

- 23 Oct 2020 -

arXiv: Computation and Language

TL;DR: The effects of negative sampling in dual encoder models used to retrieve passages for automatic question answering are explored and a new state-of-the-art level of performance is established on two of the open-domain question answering datasets that are evaluated.

...read moreread less