scispace - formally typeset
Open AccessJournal Article

Exploring the Limits of Transfer Learning with a Unified Text-to-Text Transformer

Reads0
Chats0
TLDR
This article introduced a unified framework that converts all text-based language problems into a text-to-text format and compared pre-training objectives, architectures, unlabeled data sets, transfer approaches, and other factors on dozens of language understanding tasks.
Abstract
Transfer learning, where a model is first pre-trained on a data-rich task before being fine-tuned on a downstream task, has emerged as a powerful technique in natural language processing (NLP). The effectiveness of transfer learning has given rise to a diversity of approaches, methodology, and practice. In this paper, we explore the landscape of transfer learning techniques for NLP by introducing a unified framework that converts all text-based language problems into a text-to-text format. Our systematic study compares pre-training objectives, architectures, unlabeled data sets, transfer approaches, and other factors on dozens of language understanding tasks. By combining the insights from our exploration with scale and our new ``Colossal Clean Crawled Corpus'', we achieve state-of-the-art results on many benchmarks covering summarization, question answering, text classification, and more. To facilitate future work on transfer learning for NLP, we release our data set, pre-trained models, and code.

read more

Content maybe subject to copyright    Report

Citations
More filters
Posted Content

Overview and Insights from the SciVer Shared Task on Scientific Claim Verification

TL;DR: The SciVer shared task at the 2nd SDP workshop at NAACL 2021 as discussed by the authors was the first attempt to identify which articles support or refute the claim and provide evidentiary sentences justifying those labels.
Posted Content

ePiC: Employing Proverbs in Context as a Benchmark for Abstract Language Understanding.

TL;DR: This paper introduced a large-scale crowdsourced dataset of narratives for employing proverbs in context as a benchmark for abstract language understanding, which provides fine-grained annotation of aligned spans between proverbs and narratives, and contains minimal lexical overlaps between narratives and proverbs.
Posted Content

Automatic Graph Partitioning for Very Large-scale Deep Learning

TL;DR: RaNNC as mentioned in this paper is a middleware for automatic hybrid parallelism that automatically partitions the model into a set of sub-components so that each subcomponent fits a device memory and a high training throughput for pipeline parallelism is achieved by balancing the computation times of the subcomponents.
Posted Content

Transfer training from smaller language model.

TL;DR: This paper proposed to initialize a target model from a smaller source model by copying weight values from source model and padding with zeros or small initialization values on it to make the source and target model have approximate outputs, which is valid due to block matrix multiplication and residual connection in transformer structure.
Posted Content

DoSSIER@COLIEE 2021: Leveraging dense retrieval and summarization-based re-ranking for case law retrieval.

TL;DR: In this article, the authors combine lexical and dense retrieval methods on the paragraph-level of the cases for the first stage retrieval and demonstrate that the retrieval on paragraph level outperforms the retrieval in the document-level.
Related Papers (5)
Trending Questions (1)
What are the limitations of transfer learning with a unified text-to-text transformer?

The paper does not mention the limitations of transfer learning with a unified text-to-text transformer.