scispace - formally typeset
Open AccessJournal Article

Exploring the Limits of Transfer Learning with a Unified Text-to-Text Transformer

TLDR
This article introduced a unified framework that converts all text-based language problems into a text-to-text format and compared pre-training objectives, architectures, unlabeled data sets, transfer approaches, and other factors on dozens of language understanding tasks.
Abstract
Transfer learning, where a model is first pre-trained on a data-rich task before being fine-tuned on a downstream task, has emerged as a powerful technique in natural language processing (NLP). The effectiveness of transfer learning has given rise to a diversity of approaches, methodology, and practice. In this paper, we explore the landscape of transfer learning techniques for NLP by introducing a unified framework that converts all text-based language problems into a text-to-text format. Our systematic study compares pre-training objectives, architectures, unlabeled data sets, transfer approaches, and other factors on dozens of language understanding tasks. By combining the insights from our exploration with scale and our new ``Colossal Clean Crawled Corpus'', we achieve state-of-the-art results on many benchmarks covering summarization, question answering, text classification, and more. To facilitate future work on transfer learning for NLP, we release our data set, pre-trained models, and code.

read more

Content maybe subject to copyright    Report

Citations
More filters
Journal ArticleDOI

AutoBERT-Zero: Evolving BERT Backbone from Scratch

TL;DR: This paper proposed an operation-priority neural architecture search (OP-NAS) algorithm, which optimizes both the search algorithm and evaluation of candidate models, and designed a Bi-branch Weight-Sharing (BIWS) training strategy for fast model evaluation.
Posted Content

Generate & Rank: A Multi-task Framework for Math Word Problems

TL;DR: This paper proposed Generate & Rank, a multi-task framework based on a generative pre-trained language model, which jointly learns from its own mistakes and is able to distinguish between correct and incorrect expressions.
Proceedings Article

Beyond Preserved Accuracy: Evaluating Loyalty and Robustness of BERT Compression

TL;DR: The authors proposed two new metrics, label loyalty and probability loyalty, to measure how closely a compressed model mimics the original model (i.e., student) and explore the effect of compression with regard to robustness under adversarial attacks.
Posted Content

NER-BERT: A Pre-trained Model for Low-Resource Entity Tagging.

TL;DR: Zhang et al. as mentioned in this paper constructed a massive NER corpus with a relatively high quality, and they pre-trained a NER-BERT model based on the created dataset.
Posted Content

Open-Domain Conversational Search Assistant with Transformers

TL;DR: The authors propose an open-domain abstractive conversational search agent pipeline to address two major challenges: first, conversation context-aware search and second, abstractive search-answers generation.
Related Papers (5)
Trending Questions (1)
What are the limitations of transfer learning with a unified text-to-text transformer?

The paper does not mention the limitations of transfer learning with a unified text-to-text transformer.