Open AccessJournal Article
Exploring the Limits of Transfer Learning with a Unified Text-to-Text Transformer
Colin Raffel,Noam Shazeer,Adam Roberts,Katherine Lee,Sharan Narang,Michael Matena,Yanqi Zhou,Wei Li,Peter J. Liu +8 more
Reads0
Chats0
TLDR
This article introduced a unified framework that converts all text-based language problems into a text-to-text format and compared pre-training objectives, architectures, unlabeled data sets, transfer approaches, and other factors on dozens of language understanding tasks.Abstract:
Transfer learning, where a model is first pre-trained on a data-rich task before being fine-tuned on a downstream task, has emerged as a powerful technique in natural language processing (NLP). The effectiveness of transfer learning has given rise to a diversity of approaches, methodology, and practice. In this paper, we explore the landscape of transfer learning techniques for NLP by introducing a unified framework that converts all text-based language problems into a text-to-text format. Our systematic study compares pre-training objectives, architectures, unlabeled data sets, transfer approaches, and other factors on dozens of language understanding tasks. By combining the insights from our exploration with scale and our new ``Colossal Clean Crawled Corpus'', we achieve state-of-the-art results on many benchmarks covering summarization, question answering, text classification, and more. To facilitate future work on transfer learning for NLP, we release our data set, pre-trained models, and code.read more
Citations
More filters
Posted Content
Overview and Insights from the SciVer Shared Task on Scientific Claim Verification
David Wadden,Kyle Lo +1 more
TL;DR: The SciVer shared task at the 2nd SDP workshop at NAACL 2021 as discussed by the authors was the first attempt to identify which articles support or refute the claim and provide evidentiary sentences justifying those labels.
Posted Content
ePiC: Employing Proverbs in Context as a Benchmark for Abstract Language Understanding.
Sayan Ghosh,Shashank Srivastava +1 more
TL;DR: This paper introduced a large-scale crowdsourced dataset of narratives for employing proverbs in context as a benchmark for abstract language understanding, which provides fine-grained annotation of aligned spans between proverbs and narratives, and contains minimal lexical overlaps between narratives and proverbs.
Posted Content
Automatic Graph Partitioning for Very Large-scale Deep Learning
TL;DR: RaNNC as mentioned in this paper is a middleware for automatic hybrid parallelism that automatically partitions the model into a set of sub-components so that each subcomponent fits a device memory and a high training throughput for pipeline parallelism is achieved by balancing the computation times of the subcomponents.
Posted Content
Transfer training from smaller language model.
TL;DR: This paper proposed to initialize a target model from a smaller source model by copying weight values from source model and padding with zeros or small initialization values on it to make the source and target model have approximate outputs, which is valid due to block matrix multiplication and residual connection in transformer structure.
Posted Content
DoSSIER@COLIEE 2021: Leveraging dense retrieval and summarization-based re-ranking for case law retrieval.
TL;DR: In this article, the authors combine lexical and dense retrieval methods on the paragraph-level of the cases for the first stage retrieval and demonstrate that the retrieval on paragraph level outperforms the retrieval in the document-level.