scispace - formally typeset
Open AccessJournal Article

Exploring the Limits of Transfer Learning with a Unified Text-to-Text Transformer

TLDR
This article introduced a unified framework that converts all text-based language problems into a text-to-text format and compared pre-training objectives, architectures, unlabeled data sets, transfer approaches, and other factors on dozens of language understanding tasks.
Abstract
Transfer learning, where a model is first pre-trained on a data-rich task before being fine-tuned on a downstream task, has emerged as a powerful technique in natural language processing (NLP). The effectiveness of transfer learning has given rise to a diversity of approaches, methodology, and practice. In this paper, we explore the landscape of transfer learning techniques for NLP by introducing a unified framework that converts all text-based language problems into a text-to-text format. Our systematic study compares pre-training objectives, architectures, unlabeled data sets, transfer approaches, and other factors on dozens of language understanding tasks. By combining the insights from our exploration with scale and our new ``Colossal Clean Crawled Corpus'', we achieve state-of-the-art results on many benchmarks covering summarization, question answering, text classification, and more. To facilitate future work on transfer learning for NLP, we release our data set, pre-trained models, and code.

read more

Content maybe subject to copyright    Report

Citations
More filters
Journal ArticleDOI

Infusing Finetuning with Semantic Dependencies

TL;DR: This approach applies novel probes to recent language models and finds that, unlike syntax, semantics is not brought to the surface by today’s pretrained models, and uses convolutional graph encoders to explicitly incorporate semantic parses into task-specific finetuning, yielding benefits to natural language understanding tasks in the GLUE benchmark.
Proceedings ArticleDOI

Multi-source Meta Transfer for Low Resource Multiple-Choice Question Answering

TL;DR: In this framework, a multi-source meta transfer (MMT) is proposed for low-resource MCQA by incorporating multiple training sources to learn a generalized feature representation across domains and introduces the meta transfer that can be integrated into the multi- source meta training.
Journal ArticleDOI

Lawformer: A pre-trained language model for Chinese legal long documents

TL;DR: Li et al. as discussed by the authors proposed a pre-trained language model for Chinese legal long documents understanding, named as Lawformer, which can achieve promising improvement on tasks with long documents as inputs, including judgment prediction, similar case retrieval, legal reading comprehension, and legal question answering.
Proceedings ArticleDOI

Pre-training Is (Almost) All You Need: An Application to Commonsense Reasoning

TL;DR: This paper introduces a new scoring method that casts a plausibility ranking task in a full-text format and leverages the masked language modeling head tuned during the pre-training phase and requires less annotated data than the standard classifier approach to reach equivalent performances.
Proceedings ArticleDOI

An Empirical Study on Neural Keyphrase Generation

TL;DR: This empirical study aims to fill the gap in comprehensive comparison among different model designs, and a thorough investigation on related factors that may affect a KPG system’s generalization performance by providing extensive experimental results and analyzing the most crucial factors impacting the generalizability of KPG models.
Related Papers (5)
Trending Questions (1)
What are the limitations of transfer learning with a unified text-to-text transformer?

The paper does not mention the limitations of transfer learning with a unified text-to-text transformer.