scispace - formally typeset
Open AccessJournal Article

Exploring the Limits of Transfer Learning with a Unified Text-to-Text Transformer

TLDR
This article introduced a unified framework that converts all text-based language problems into a text-to-text format and compared pre-training objectives, architectures, unlabeled data sets, transfer approaches, and other factors on dozens of language understanding tasks.
Abstract
Transfer learning, where a model is first pre-trained on a data-rich task before being fine-tuned on a downstream task, has emerged as a powerful technique in natural language processing (NLP). The effectiveness of transfer learning has given rise to a diversity of approaches, methodology, and practice. In this paper, we explore the landscape of transfer learning techniques for NLP by introducing a unified framework that converts all text-based language problems into a text-to-text format. Our systematic study compares pre-training objectives, architectures, unlabeled data sets, transfer approaches, and other factors on dozens of language understanding tasks. By combining the insights from our exploration with scale and our new ``Colossal Clean Crawled Corpus'', we achieve state-of-the-art results on many benchmarks covering summarization, question answering, text classification, and more. To facilitate future work on transfer learning for NLP, we release our data set, pre-trained models, and code.

read more

Content maybe subject to copyright    Report

Citations
More filters
Book ChapterDOI

Memorization in Deep Neural Networks: Does the Loss Function Matter?

TL;DR: The authors empirically show that a symmetric loss function as opposed to either cross entropy or squared error loss results in significant improvement in the ability of the network to resist such overfitting.
Proceedings ArticleDOI

DialFact: A Benchmark for Fact-Checking in Dialogue

TL;DR: In this paper , the authors introduce the task of fact-checking in dialogue, which is a relatively unexplored area, and construct DialFact, a testing benchmark dataset of 22,245 annotated conversational claims, paired with pieces of evidence from Wikipedia.
Journal ArticleDOI

Deep neural networks architecture driven by problem-specific information

TL;DR: The use of problem-specific information is proposed in order to impose constraints to network architecture so that a fc-DNN is transformed into a partially connected DNN (pc- DNN), in such a way that network topology is driven by prior knowledge.
Proceedings Article

A Model of Cross-Lingual Knowledge-Grounded Response Generation for Open-Domain Dialogue Systems

TL;DR: This paper showed that knowledge inherent in cross-lingual language models can be helpful for generating responses in open-domain Korean dialogue systems, even with only English knowledge given to the dialogue system, and developed a knowledge-grounded Korean dialogue model based on KE-T5.
Posted Content

CDLM: Cross-Document Language Modeling

TL;DR: This paper proposed a cross-document language model (CDLM) for multidocument language modeling, which incorporates two key ideas into the masked language modeling self-supervised objective: first, instead of considering documents in isolation, they pretrain over sets of multiple related documents, and second, they improve over recent long-range transformers by introducing dynamic global attention to predict masked tokens.
Related Papers (5)
Trending Questions (1)
What are the limitations of transfer learning with a unified text-to-text transformer?

The paper does not mention the limitations of transfer learning with a unified text-to-text transformer.