Open AccessJournal Article
Exploring the Limits of Transfer Learning with a Unified Text-to-Text Transformer
Colin Raffel,Noam Shazeer,Adam Roberts,Katherine Lee,Sharan Narang,Michael Matena,Yanqi Zhou,Wei Li,Peter J. Liu +8 more
TLDR
This article introduced a unified framework that converts all text-based language problems into a text-to-text format and compared pre-training objectives, architectures, unlabeled data sets, transfer approaches, and other factors on dozens of language understanding tasks.Abstract:
Transfer learning, where a model is first pre-trained on a data-rich task before being fine-tuned on a downstream task, has emerged as a powerful technique in natural language processing (NLP). The effectiveness of transfer learning has given rise to a diversity of approaches, methodology, and practice. In this paper, we explore the landscape of transfer learning techniques for NLP by introducing a unified framework that converts all text-based language problems into a text-to-text format. Our systematic study compares pre-training objectives, architectures, unlabeled data sets, transfer approaches, and other factors on dozens of language understanding tasks. By combining the insights from our exploration with scale and our new ``Colossal Clean Crawled Corpus'', we achieve state-of-the-art results on many benchmarks covering summarization, question answering, text classification, and more. To facilitate future work on transfer learning for NLP, we release our data set, pre-trained models, and code.read more
Citations
More filters
Book ChapterDOI
Cross-Domain Transfer of Generative Explanations Using Text-to-Text Models
TL;DR: This article proposed a three-step training procedure to improve explanation quality by up to 7% and avoid sacrificing classification performance on the downstream task, while at the same time reducing the need for human annotations.
Proceedings ArticleDOI
Shades of BLEU, Flavours of Success: The Case of MultiWOZ
Tomáš Nekvinda,Ondřej Dušek +1 more
TL;DR: In this article, the authors identify inconsistencies in data preprocessing and re-porting of three corpus-based metrics used on the MultiWOZ dataset, i.e., BLEU score and Inform &Success rates.
Proceedings ArticleDOI
TaCL: Improving BERT Pre-training with Token-aware Contrastive Learning
TL;DR: The authors proposed a token-aware contrastive learning (TaCL) model, which encourages BERT to learn an isotropic and discriminative distribution of token representations for NER tasks.
Posted Content
Prompting Contrastive Explanations for Commonsense Reasoning Tasks.
TL;DR: The authors used pre-trained language models to generate contrastive explanations for commonsense reasoning tasks, which are judged by humans to be more relevant for solving the task and facilitate a novel method to evaluate explanation faithfulfness.
Posted Content
Language Models are Few-shot Multilingual Learners
TL;DR: The authors showed that given a few English examples as context, pre-trained language models can predict not only English test samples but also non-English ones, and they are significantly better than random prediction.