Exploring the Limits of Transfer Learning with a Unified Text-to-Text Transformer

Open AccessJournal Article

Exploring the Limits of Transfer Learning with a Unified Text-to-Text Transformer

Colin Raffel, +8 more

- 01 Jan 2020 -

Journal of Machine Learning Research

- Vol. 21, Iss: 140, pp 1-67

TLDR

This article introduced a unified framework that converts all text-based language problems into a text-to-text format and compared pre-training objectives, architectures, unlabeled data sets, transfer approaches, and other factors on dozens of language understanding tasks.

Abstract:

Transfer learning, where a model is first pre-trained on a data-rich task before being fine-tuned on a downstream task, has emerged as a powerful technique in natural language processing (NLP). The effectiveness of transfer learning has given rise to a diversity of approaches, methodology, and practice. In this paper, we explore the landscape of transfer learning techniques for NLP by introducing a unified framework that converts all text-based language problems into a text-to-text format. Our systematic study compares pre-training objectives, architectures, unlabeled data sets, transfer approaches, and other factors on dozens of language understanding tasks. By combining the insights from our exploration with scale and our new ``Colossal Clean Crawled Corpus'', we achieve state-of-the-art results on many benchmarks covering summarization, question answering, text classification, and more. To facilitate future work on transfer learning for NLP, we release our data set, pre-trained models, and code.

Citations

PDF

Open Access

More filters

Journal ArticleDOI

Infusing Finetuning with Semantic Dependencies

Zhaofeng Wu, +3 more

- 11 Mar 2021 -

Transactions of the Association for Comp...

TL;DR: This approach applies novel probes to recent language models and finds that, unlike syntax, semantics is not brought to the surface by today’s pretrained models, and uses convolutional graph encoders to explicitly incorporate semantic parses into task-specific finetuning, yielding benefits to natural language understanding tasks in the GLUE benchmark.

...read moreread less

Proceedings ArticleDOI

Multi-source Meta Transfer for Low Resource Multiple-Choice Question Answering

Ming Yan, +3 more

TL;DR: In this framework, a multi-source meta transfer (MMT) is proposed for low-resource MCQA by incorporating multiple training sources to learn a generalized feature representation across domains and introduces the meta transfer that can be integrated into the multi- source meta training.

...read moreread less

Journal ArticleDOI

Lawformer: A pre-trained language model for Chinese legal long documents

Chaojun Xiao, +4 more

TL;DR: Li et al. as discussed by the authors proposed a pre-trained language model for Chinese legal long documents understanding, named as Lawformer, which can achieve promising improvement on tasks with long documents as inputs, including judgment prediction, similar case retrieval, legal reading comprehension, and legal question answering.

...read moreread less

Proceedings ArticleDOI

Pre-training Is (Almost) All You Need: An Application to Commonsense Reasoning

Alexandre Tamborrino, +4 more

TL;DR: This paper introduces a new scoring method that casts a plausibility ranking task in a full-text format and leverages the masked language modeling head tuned during the pre-training phase and requires less annotated data than the standard classifier approach to reach equivalent performances.

...read moreread less

Proceedings ArticleDOI

An Empirical Study on Neural Keyphrase Generation

Rui Meng, +5 more

TL;DR: This empirical study aims to fill the gap in comprehensive comparison among different model designs, and a thorough investigation on related factors that may affect a KPG system’s generalization performance by providing extensive experimental results and analyzing the most crucial factors impacting the generalizability of KPG models.

...read moreread less