Exploring the Limits of Transfer Learning with a Unified Text-to-Text Transformer

Home
/
Papers
/
Exploring the Limits of Transfer Learning with a Unified Text-to-Text Transformer

Journal Article•

Exploring the Limits of Transfer Learning with a Unified Text-to-Text Transformer

Colin Raffel, Noam Shazeer, Adam Roberts, Katherine Lee, Sharan Narang, Michael Matena, Yanqi Zhou, Wei Li, Peter J. Liu - Show less +5 more

01 Jan 2020-Journal of Machine Learning Research-Vol. 21, Iss: 140, pp 1-67

TL;DR: This article introduced a unified framework that converts all text-based language problems into a text-to-text format and compared pre-training objectives, architectures, unlabeled data sets, transfer approaches, and other factors on dozens of language understanding tasks.

read less

Abstract: Transfer learning, where a model is first pre-trained on a data-rich task before being fine-tuned on a downstream task, has emerged as a powerful technique in natural language processing (NLP). The effectiveness of transfer learning has given rise to a diversity of approaches, methodology, and practice. In this paper, we explore the landscape of transfer learning techniques for NLP by introducing a unified framework that converts all text-based language problems into a text-to-text format. Our systematic study compares pre-training objectives, architectures, unlabeled data sets, transfer approaches, and other factors on dozens of language understanding tasks. By combining the insights from our exploration with scale and our new ``Colossal Clean Crawled Corpus'', we achieve state-of-the-art results on many benchmarks covering summarization, question answering, text classification, and more. To facilitate future work on transfer learning for NLP, we release our data set, pre-trained models, and code.

...read moreread less

Content maybe subject to copyright Report

Citations

PDF

Open Access

More filters

Proceedings Article•DOI•

jiant: A Software Toolkit for Research on General-Purpose Text Understanding Models

[...]

Yada Pruksachatkun¹, Philip Yeres¹, Haokun Liu¹, Jason Phang¹, Phu Mon Htut¹, Alex Wang¹, Ian Tenney², Samuel R. Bowman¹ - Show less +4 more•Institutions (2)

New York University¹, Google²

04 Mar 2020

TL;DR: Jiant as mentioned in this paper is an open source toolkit for conducting multitask and transfer learning experiments on English NLU tasks, enabling modular and configuration driven experimentation with state-of-the-art models and a broad set of tasks.

...read moreread less

Abstract: We introduce jiant, an open source toolkit for conducting multitask and transfer learning experiments on English NLU tasks jiant enables modular and configuration driven experimentation with state-of-the-art models and a broad set of tasks for probing, transfer learning, and multitask training experiments jiant implements over 50 NLU tasks, including all GLUE and SuperGLUE benchmark tasks We demonstrate that jiant reproduces published performance on a variety of tasks and models, eg, RoBERTa and BERT

...read moreread less

65 citations

Proceedings Article•DOI•

Learning from Task Descriptions

[...]

Orion Weller¹, Nicholas Lourie², Matt Gardner³, Matthew E. Peters⁴•Institutions (4)

Brigham Young University¹, Hebrew University of Jerusalem², University of Washington³, Allen Institute for Artificial Intelligence⁴

01 Nov 2020

TL;DR: This work introduces a framework for developing NLP systems that solve new tasks after reading their descriptions, synthesizing prior work in this area, and instantiates it with a new English language dataset, ZEST, structured for task-oriented evaluation on unseen tasks.

...read moreread less

Abstract: Typically, machine learning systems solve new tasks by training on thousands of examples. In contrast, humans can solve new tasks by reading some instructions, with perhaps an example or two. To take a step toward closing this gap, we introduce a framework for developing NLP systems that solve new tasks after reading their descriptions, synthesizing prior work in this area. We instantiate this frame- work with a new English language dataset, ZEST, structured for task-oriented evaluation on unseen tasks. Formulating task descriptions as questions, we ensure each is general enough to apply to many possible inputs, thus comprehensively evaluating a model’s ability to solve each task. Moreover, the dataset’s structure tests specific types of systematic generalization. We find that the state-of-the-art T5 model achieves a score of 12% on ZEST, leaving a significant challenge for NLP researchers.

...read moreread less

65 citations

Proceedings Article•DOI•

FedED: Federated Learning via Ensemble Distillation for Medical Relation Extraction

[...]

Dianbo Sui¹, Yubo Chen¹, Jun Zhao¹, Yantao Jia, Yuantao Xie², Weijian Sun² - Show less +2 more•Institutions (2)

Chinese Academy of Sciences¹, Huawei²

01 Nov 2020

TL;DR: This paper proposes a privacy-preserving medical relation extraction model based on federated learning, which enables training a central model with no single piece of private local data being shared or exchanged.

...read moreread less

Abstract: Unlike other domains, medical texts are inevitably accompanied by private information, so sharing or copying these texts is strictly restricted. However, training a medical relation extraction model requires collecting these privacy-sensitive texts and storing them on one machine, which comes in conflict with privacy protection. In this paper, we propose a privacy-preserving medical relation extraction model based on federated learning, which enables training a central model with no single piece of private local data being shared or exchanged. Though federated learning has distinct advantages in privacy protection, it suffers from the communication bottleneck, which is mainly caused by the need to upload cumbersome local parameters. To overcome this bottleneck, we leverage a strategy based on knowledge distillation. Such a strategy uses the uploaded predictions of ensemble local models to train the central model without requiring uploading local parameters. Experiments on three publicly available medical relation extraction datasets demonstrate the effectiveness of our method.

...read moreread less

65 citations

Posted Content•

UnifiedQA: Crossing Format Boundaries With a Single QA System

[...]

Daniel Khashabi¹, Sewon Min², Tushar Khot¹, Ashish Sabharwal¹, Oyvind Tafjord¹, Peter Clark¹, Hannaneh Hajishirzi² - Show less +3 more•Institutions (2)

Allen Institute for Artificial Intelligence¹, University of Washington²

02 May 2020-arXiv: Computation and Language

TL;DR: In this article, the authors use the latest advances in language modeling to build a single pre-trained QA model, UnifiedQA, that performs surprisingly well across 17 QA datasets spanning 4 diverse formats.

...read moreread less

Abstract: Question answering (QA) tasks have been posed using a variety of formats, such as extractive span selection, multiple choice, etc. This has led to format-specialized models, and even to an implicit division in the QA community. We argue that such boundaries are artificial and perhaps unnecessary, given the reasoning abilities we seek to teach are not governed by the format. As evidence, we use the latest advances in language modeling to build a single pre-trained QA model, UnifiedQA, that performs surprisingly well across 17 QA datasets spanning 4 diverse formats. UnifiedQA performs on par with 9 different models that were trained on individual datasets themselves. Even when faced with 12 unseen datasets of observed formats, UnifiedQA performs surprisingly well, showing strong generalization from its out-of-format training data. Finally, simply fine-tuning this pre-trained QA model into specialized models results in a new state of the art on 6 datasets, establishing UnifiedQA as a strong starting point for building QA systems.

...read moreread less

65 citations

Posted Content•

Intermediate-Task Transfer Learning with Pretrained Models for Natural Language Understanding: When and Why Does It Work?

[...]

Yada Pruksachatkun, Jason Phang, Haokun Liu, Phu Mon Htut, Xiaoyi Zhang, Richard Yuanzhe Pang, Clara Vania, Katharina Kann, Samuel R. Bowman - Show less +5 more

01 May 2020-arXiv: Computation and Language

TL;DR: It is observed that intermediate tasks requiring high-level inference and reasoning abilities tend to work best and that target task performance is strongly correlated with higher-level abilities such as coreference resolution, but it is failed to observe more granular correlations between probing and target taskperformance.

...read moreread less

Abstract: While pretrained models such as BERT have shown large gains across natural language understanding tasks, their performance can be improved by further training the model on a data-rich intermediate task, before fine-tuning it on a target task. However, it is still poorly understood when and why intermediate-task training is beneficial for a given target task. To investigate this, we perform a large-scale study on the pretrained RoBERTa model with 110 intermediate-target task combinations. We further evaluate all trained models with 25 probing tasks meant to reveal the specific skills that drive transfer. We observe that intermediate tasks requiring high-level inference and reasoning abilities tend to work best. We also observe that target task performance is strongly correlated with higher-level abilities such as coreference resolution. However, we fail to observe more granular correlations between probing and target task performance, highlighting the need for further work on broad-coverage probing benchmarks. We also observe evidence that the forgetting of knowledge learned during pretraining may limit our analysis, highlighting the need for further work on transfer learning methods in these settings.

...read moreread less

64 citations