Exploring the Limits of Transfer Learning with a Unified Text-to-Text Transformer

Home
/
Papers
/
Exploring the Limits of Transfer Learning with a Unified Text-to-Text Transformer

Journal Article•

Exploring the Limits of Transfer Learning with a Unified Text-to-Text Transformer

Colin Raffel, Noam Shazeer, Adam Roberts, Katherine Lee, Sharan Narang, Michael Matena, Yanqi Zhou, Wei Li, Peter J. Liu - Show less +5 more

01 Jan 2020-Journal of Machine Learning Research-Vol. 21, Iss: 140, pp 1-67

TL;DR: This article introduced a unified framework that converts all text-based language problems into a text-to-text format and compared pre-training objectives, architectures, unlabeled data sets, transfer approaches, and other factors on dozens of language understanding tasks.

read less

Abstract: Transfer learning, where a model is first pre-trained on a data-rich task before being fine-tuned on a downstream task, has emerged as a powerful technique in natural language processing (NLP). The effectiveness of transfer learning has given rise to a diversity of approaches, methodology, and practice. In this paper, we explore the landscape of transfer learning techniques for NLP by introducing a unified framework that converts all text-based language problems into a text-to-text format. Our systematic study compares pre-training objectives, architectures, unlabeled data sets, transfer approaches, and other factors on dozens of language understanding tasks. By combining the insights from our exploration with scale and our new ``Colossal Clean Crawled Corpus'', we achieve state-of-the-art results on many benchmarks covering summarization, question answering, text classification, and more. To facilitate future work on transfer learning for NLP, we release our data set, pre-trained models, and code.

...read moreread less

Content maybe subject to copyright Report

Citations

PDF

Open Access

More filters

Proceedings Article•DOI•

K-Adapter: Infusing Knowledge into Pre-Trained Models with Adapters

[...]

Ruize Wang¹, Duyu Tang², Nan Duan³, Zhongyu Wei¹, Xuanjing Huang¹, Jianshu Ji³, Guihong Cao³, Daxin Jiang³, Ming Zhou⁴ - Show less +5 more•Institutions (4)

Fudan University¹, Harbin Institute of Technology², Microsoft³, Shanghai Jiao Tong University⁴

04 May 2021

TL;DR: The authors propose K-Adapter, which remains the original parameters of the pre-trained model fixed and supports continual knowledge infusion, and injects factual knowledge obtained from automatically aligned text-triplets on Wikipedia and Wikidata, and linguistic knowledge from dependency parsing.

...read moreread less

Abstract: We study the problem of injecting knowledge into large pre-trained models like BERT and RoBERTa. Existing methods typically update the original parameters of pre-trained models when injecting knowledge. However, when multiple kinds of knowledge are injected, they may suffer from catastrophic forgetting. To address this, we propose K-Adapter, which remains the original parameters of the pre-trained model fixed and supports continual knowledge infusion. Taking RoBERTa as the pre-trained model, K-Adapter has a neural adapter for each kind of infused knowledge, like a plug-in connected to RoBERTa. There is no information flow between different adapters, thus different adapters are efficiently trained in a distributed way. We inject two kinds of knowledge, including factual knowledge obtained from automatically aligned text-triplets on Wikipedia and Wikidata, and linguistic knowledge obtained from dependency parsing. Results on three knowledge-driven tasks (total six datasets) including relation classification, entity typing and question answering demonstrate that each adapter improves the performance, and the combination of both adapters brings further improvements. Probing experiments further indicate that K-Adapter captures richer factual and commonsense knowledge than RoBERTa.

...read moreread less

51 citations

Proceedings Article•DOI•

Leveraging Slot Descriptions for Zero-Shot Cross-Domain Dialogue StateTracking.

[...]

Zhaojiang Lin¹, Bing Liu², Seungwhan Moon², Paul A. Crook², Zhenpeng Zhou³, Zhiguang Wang⁴, Zhou Yu⁵, Andrea Madotto¹, Eunjoon Cho⁶, Rajen Subba² - Show less +6 more•Institutions (6)

Hong Kong University of Science and Technology¹, Facebook², Stanford University³, Xi'an Jiaotong University⁴, University of California, Davis⁵, Google⁶

10 May 2021

TL;DR: This model first encodes a dialogue context and a slot with a pre-trained self-attentive encoder, and generates slot value in auto-regressive manner, and incorporates Slot Type Informed Descriptions that capture the shared information of different slots to facilitates the cross-domain knowledge transfer.

...read moreread less

Abstract: Zero-shot cross-domain dialogue state tracking (DST) enables us to handle unseen domains without the expense of collecting in-domain data. In this paper, we propose a slot descriptions enhanced generative approach for zero-shot cross-domain DST. Specifically, our model first encodes a dialogue context and a slot with a pre-trained self-attentive encoder, and generates slot value in auto-regressive manner. In addition, we incorporate Slot Type Informed Descriptions that capture the shared information of different slots to facilitates the cross-domain knowledge transfer. Experimental results on MultiWOZ shows that our model significantly improve existing state-of-the-art results in zero-shot cross-domain setting.

...read moreread less

51 citations

Journal Article•DOI•

From static to dynamic word representations: a survey

[...]

Yuxuan Wang¹, Yutai Hou¹, Wanxiang Che¹, Ting Liu¹•Institutions (1)

Harbin Institute of Technology¹

01 Jul 2020-International Journal of Machine Learning and Cybernetics

TL;DR: This survey provides a comprehensive typology of word representation models from a novel perspective that the development from static to dynamic embeddings can effectively address the polysemy problem.

...read moreread less

Abstract: In the history of natural language processing (NLP) development, the representation of words has always been a significant research topic. In this survey, we provide a comprehensive typology of word representation models from a novel perspective that the development from static to dynamic embeddings can effectively address the polysemy problem, which has been a great challenge in this field. Then the survey covers the main evaluation metrics and applications of these word embeddings. And, we further discuss the development of word embeddings from static to dynamic in cross-lingual scenario. Finally, we point out some open issues and future works.

...read moreread less

51 citations

Proceedings Article•DOI•

Exploring Controllable Text Generation Techniques.

[...]

Shrimai Prabhumoye¹, Alan W. Black¹, Ruslan Salakhutdinov¹•Institutions (1)

Carnegie Mellon University¹

01 Dec 2020

TL;DR: This work provides a new schema of the pipeline of the generation process by classifying it into five modules, and presents an overview of different techniques used to perform the modulation of these modules.

...read moreread less

Abstract: Neural controllable text generation is an important area gaining attention due to its plethora of applications. Although there is a large body of prior work in controllable text generation, there is no unifying theme. In this work, we provide a new schema of the pipeline of the generation process by classifying it into five modules. The control of attributes in the generation process requires modification of these modules. We present an overview of different techniques used to perform the modulation of these modules. We also provide an analysis on the advantages and disadvantages of these techniques. We further pave ways to develop new architectures based on the combination of the modules described in this paper.

...read moreread less

50 citations

Proceedings Article•DOI•

Cross-Linguistic Syntactic Evaluation of Word Prediction Models.

[...]

Aaron Mueller¹, Garrett Nicolai², Panayiota Petrou-Zeniou, Natalia Talmina, Tal Linzen¹ - Show less +1 more•Institutions (2)

Johns Hopkins University¹, University of British Columbia²

01 Jul 2020

TL;DR: ClAMS (Cross-Linguistic Assessment of Models on Syntax), a syntactic evaluation suite for monolingual and multilingual models, is introduced, which uses subject-verb agreement challenge sets for English, French, German, Hebrew and Russian, generated from grammars developed.

...read moreread less

Abstract: A range of studies have concluded that neural word prediction models can distinguish grammatical from ungrammatical sentences with high accuracy. However, these studies are based primarily on monolingual evidence from English. To investigate how these models' ability to learn syntax varies by language, we introduce CLAMS (Cross-Linguistic Assessment of Models on Syntax), a syntactic evaluation suite for monolingual and multilingual models. CLAMS includes subject-verb agreement challenge sets for English, French, German, Hebrew and Russian, generated from grammars we develop. We use CLAMS to evaluate LSTM language models as well as monolingual and multilingual BERT. Across languages, monolingual LSTMs achieved high accuracy on dependencies without attractors, and generally poor accuracy on agreement across object relative clauses. On other constructions, agreement accuracy was generally higher in languages with richer morphology. Multilingual models generally underperformed monolingual models. Multilingual BERT showed high syntactic accuracy on English, but noticeable deficiencies in other languages.

...read moreread less

50 citations