Exploring the Limits of Transfer Learning with a Unified Text-to-Text Transformer

Home
/
Papers
/
Exploring the Limits of Transfer Learning with a Unified Text-to-Text Transformer

Journal Article•

Exploring the Limits of Transfer Learning with a Unified Text-to-Text Transformer

Colin Raffel, Noam Shazeer, Adam Roberts, Katherine Lee, Sharan Narang, Michael Matena, Yanqi Zhou, Wei Li, Peter J. Liu - Show less +5 more

01 Jan 2020-Journal of Machine Learning Research-Vol. 21, Iss: 140, pp 1-67

TL;DR: This article introduced a unified framework that converts all text-based language problems into a text-to-text format and compared pre-training objectives, architectures, unlabeled data sets, transfer approaches, and other factors on dozens of language understanding tasks.

read less

Abstract: Transfer learning, where a model is first pre-trained on a data-rich task before being fine-tuned on a downstream task, has emerged as a powerful technique in natural language processing (NLP). The effectiveness of transfer learning has given rise to a diversity of approaches, methodology, and practice. In this paper, we explore the landscape of transfer learning techniques for NLP by introducing a unified framework that converts all text-based language problems into a text-to-text format. Our systematic study compares pre-training objectives, architectures, unlabeled data sets, transfer approaches, and other factors on dozens of language understanding tasks. By combining the insights from our exploration with scale and our new ``Colossal Clean Crawled Corpus'', we achieve state-of-the-art results on many benchmarks covering summarization, question answering, text classification, and more. To facilitate future work on transfer learning for NLP, we release our data set, pre-trained models, and code.

...read moreread less

Content maybe subject to copyright Report

Citations

PDF

Open Access

More filters

Posted Content•

A Flexible Multi-Task Model for BERT Serving

[...]

Tianwen Wei¹, Jianwei Qi¹, Shenghuan He¹•Institutions (1)

Xiaomi¹

12 Jul 2021-arXiv: Computation and Language

TL;DR: In this paper, a BERT-based multi-task (MT) framework was proposed for iterative and incremental development of the tasks, which was based on the idea of partial fine-tuning, i.e. only fine-tune some top layers of BERT while keeping the other layers frozen.

...read moreread less

Abstract: In this demonstration, we present an efficient BERT-based multi-task (MT) framework that is particularly suitable for iterative and incremental development of the tasks. The proposed framework is based on the idea of partial fine-tuning, i.e. only fine-tune some top layers of BERT while keep the other layers frozen. For each task, we train independently a single-task (ST) model using partial fine-tuning. Then we compress the task-specific layers in each ST model using knowledge distillation. Those compressed ST models are finally merged into one MT model so that the frozen layers of the former are shared across the tasks. We exemplify our approach on eight GLUE tasks, demonstrating that it is able to achieve both strong performance and efficiency. We have implemented our method in the utterance understanding system of XiaoAI, a commercial AI assistant developed by Xiaomi. We estimate that our model reduces the overall serving cost by 86%.

...read moreread less

Posted Content•

Teach me how to Label: Labeling Functions from Natural Language with Text-to-text Transformers

[...]

Yannis Papanikolaou

18 Jan 2021-arXiv: Computation and Language

TL;DR: This article used pre-trained text-to-text Transformers for semantic parsing and achieved state-of-the-art results on the CoNaLa semantic parsing benchmark. But they did not provide specific labeled data.

...read moreread less

Abstract: Annotated data has become the most important bottleneck in training accurate machine learning models, especially for areas that require domain expertise. A recent approach to deal with the above issue proposes using natural language explanations instead of labeling individual data points, thereby increasing human annotators' efficiency as well as decreasing costs substantially. This paper focuses on the task of turning these natural language descriptions into Python labeling functions by following a novel approach to semantic parsing with pre-trained text-to-text Transformers. In a series of experiments our approach achieves a new state of the art on the semantic parsing benchmark CoNaLa, surpassing the previous best approach by 3.7 BLEU points. Furthermore, on a manually constructed dataset of natural language descriptions-labeling functions pairs we achieve a BLEU of 0.39. Our approach can be regarded as a stepping stone towards models that are taught how to label in natural language, instead of being provided specific labeled samples. Our code, constructed dataset and models are available at this https URL.

...read moreread less

Posted Content•

Plan-then-Generate: Controlled Data-to-Text Generation via Planning

[...]

Yixuan Su¹, David Vandyke¹, Sihui Wang, Yimai Fang, Nigel Collier² - Show less +1 more•Institutions (2)

University of Cambridge¹, Monash University²

31 Aug 2021-arXiv: Computation and Language

TL;DR: In this paper, the authors proposed a novel Plan-then-Generate (PlanGen) framework to improve the controllability of neural data-to-text models, which is able to control both the intra-sentence and intersentence structure of the generated output.

...read moreread less

Abstract: Recent developments in neural networks have led to the advance in data-to-text generation. However, the lack of ability of neural models to control the structure of generated output can be limiting in certain real-world applications. In this study, we propose a novel Plan-then-Generate (PlanGen) framework to improve the controllability of neural data-to-text models. Extensive experiments and analyses are conducted on two benchmark datasets, ToTTo and WebNLG. The results show that our model is able to control both the intra-sentence and inter-sentence structure of the generated output. Furthermore, empirical comparisons against previous state-of-the-art methods show that our model improves the generation quality as well as the output diversity as judged by human and automatic evaluations.

...read moreread less

Posted Content•

Challenging Instances are Worth Learning: Generating Valuable Negative Samples for Response Selection Training.

[...]

Yao Qiu, Jinchao Zhang, Huiying Ren, Jie Zhou¹•Institutions (1)

Tencent¹

14 Sep 2021-arXiv: Computation and Language

TL;DR: This paper employed pre-trained language models, such as the DialoGPT, to construct more challenging negative instances to enhance the model robustness, and provided garbled context to the pretrained model to generate responses and filter the fake negative ones.

...read moreread less

Abstract: Retrieval-based chatbot selects the appropriate response from candidates according to the context, which heavily depends on a response selection module. A response selection module is generally a scoring model to evaluate candidates and is usually trained on the annotated positive response and sampled negative responses. Sampling negative responses lead to two risks: a). The sampled negative instances, especially that from random sampling methods, are mostly irrelevant to the dialogue context and too easy to be fitted at the training stage while causing a weak model in the real scenario. b). The so-called negative instances may be positive, which is known as the fake negative problem. To address the above issue, we employ pre-trained language models, such as the DialoGPT to construct more challenging negative instances to enhance the model robustness. Specifically, we provide garbled context to the pre-trained model to generate responses and filter the fake negative ones. In this way, our negative instances are fluent, context-related, and more challenging for the model to learn, while can not be positive. Extensive experiments show that our method brings significant and stable improvements on the dialogue response selection capacity.

...read moreread less

Posted Content•

Decrypting Cryptic Crosswords: Semantically Complex Wordplay Puzzles as a Target for NLP

[...]

Joshua Rozner¹, Christopher Potts¹, Kyle Mahowald²•Institutions (2)

Stanford University¹, University of Texas at Austin²

17 Apr 2021-arXiv: Computation and Language

TL;DR: In this paper, the authors present a dataset of cryptic crossword clues from a major newspaper that can be used as a benchmark and train a sequence-to-sequence model to solve them.

...read moreread less

Abstract: Cryptic crosswords, the dominant English-language crossword variety in the United Kingdom, can be solved by expert humans using flexible, creative intelligence and knowledge of language. Cryptic clues read like fluent natural language, but they are adversarially composed of two parts: a definition and a wordplay cipher requiring sub-word or character-level manipulations. As such, they are a promising target for evaluating and advancing NLP systems that seek to process language in more creative, human-like ways. We present a dataset of cryptic crossword clues from a major newspaper that can be used as a benchmark and train a sequence-to-sequence model to solve them. We also develop related benchmarks that can guide development of approaches to this challenging task. We show that performance can be substantially improved using a novel curriculum learning approach in which the model is pre-trained on related tasks involving, e.g, unscrambling words, before it is trained to solve cryptics. However, even this curricular approach does not generalize to novel clue types in the way that humans can, and so cryptic crosswords remain a challenge for NLP systems and a potential source of future innovation.

...read moreread less