Exploring the Limits of Transfer Learning with a Unified Text-to-Text Transformer

Open AccessJournal Article

Exploring the Limits of Transfer Learning with a Unified Text-to-Text Transformer

Colin Raffel, +8 more

- 01 Jan 2020 -

Journal of Machine Learning Research

- Vol. 21, Iss: 140, pp 1-67

Chats0

TLDR

This article introduced a unified framework that converts all text-based language problems into a text-to-text format and compared pre-training objectives, architectures, unlabeled data sets, transfer approaches, and other factors on dozens of language understanding tasks.

Abstract:

Transfer learning, where a model is first pre-trained on a data-rich task before being fine-tuned on a downstream task, has emerged as a powerful technique in natural language processing (NLP). The effectiveness of transfer learning has given rise to a diversity of approaches, methodology, and practice. In this paper, we explore the landscape of transfer learning techniques for NLP by introducing a unified framework that converts all text-based language problems into a text-to-text format. Our systematic study compares pre-training objectives, architectures, unlabeled data sets, transfer approaches, and other factors on dozens of language understanding tasks. By combining the insights from our exploration with scale and our new ``Colossal Clean Crawled Corpus'', we achieve state-of-the-art results on many benchmarks covering summarization, question answering, text classification, and more. To facilitate future work on transfer learning for NLP, we release our data set, pre-trained models, and code.

Citations

PDF

Open Access

More filters

Posted Content

Text-to-Text Multi-view Learning for Passage Re-ranking

Jia-Huei Ju, +2 more

- 29 Apr 2021 -

arXiv: Information Retrieval

TL;DR: This article proposed a text-to-text multi-view learning framework by incorporating an additional view, the text generation view, into a typical single-view passage ranking model, which is of help to the ranking performance compared to its singleview counterpart.

...read moreread less

Posted Content

Learning Span-Level Interactions for Aspect Sentiment Triplet Extraction

Lu Xu, +2 more

- 26 Jul 2021 -

arXiv: Computation and Language

TL;DR: This paper proposed a dual-channel span pruning strategy by incorporating supervision from the Aspect Term Extraction (ATE) and Opinion Term Extraction (OTE) tasks, which not only improves computational efficiency but also distinguishes the opinion and target spans more properly.

...read moreread less

Proceedings ArticleDOI

Improving Social Meaning Detection with Pragmatic Masking and Surrogate Fine-Tuning

TL;DR: The authors propose pragmatic masking and surrogate fine-tuning as two complementing strategies that exploit social cues to drive pre-trained representations toward a broad set of concepts useful for a wide class of social meaning tasks.

...read moreread less

Proceedings Article

Automatic Text Evaluation through the Lens of Wasserstein Barycenters.

Pierre Colombo, +3 more

TL;DR: In this paper, a new metric BaryScore is proposed to evaluate text generation based on deep contextualized embeddings (e.g., BERT, Roberta, ELMo).

...read moreread less

Proceedings Article

ExplaGraphs: An Explanation Graph Generation Task for Structured Commonsense Reasoning.

Swarnadeep Saha, +3 more

TL;DR: Recently, this article presented ExplaGraphs, a new generative and structured commonsense-reasoning task (and an associated dataset) of explanation graph generation for stance prediction, where given a belief and an argument, a model has to predict if the argument supports or counters the belief and also generate a commonsense augmented graph that serves as non-trivial, complete, and unambiguous explanation for the predicted stance.

...read moreread less