scispace - formally typeset
Search or ask a question
Journal Article

Exploring the Limits of Transfer Learning with a Unified Text-to-Text Transformer

TL;DR: This article introduced a unified framework that converts all text-based language problems into a text-to-text format and compared pre-training objectives, architectures, unlabeled data sets, transfer approaches, and other factors on dozens of language understanding tasks.
Abstract: Transfer learning, where a model is first pre-trained on a data-rich task before being fine-tuned on a downstream task, has emerged as a powerful technique in natural language processing (NLP). The effectiveness of transfer learning has given rise to a diversity of approaches, methodology, and practice. In this paper, we explore the landscape of transfer learning techniques for NLP by introducing a unified framework that converts all text-based language problems into a text-to-text format. Our systematic study compares pre-training objectives, architectures, unlabeled data sets, transfer approaches, and other factors on dozens of language understanding tasks. By combining the insights from our exploration with scale and our new ``Colossal Clean Crawled Corpus'', we achieve state-of-the-art results on many benchmarks covering summarization, question answering, text classification, and more. To facilitate future work on transfer learning for NLP, we release our data set, pre-trained models, and code.

Content maybe subject to copyright    Report

Citations
More filters
Proceedings ArticleDOI
04 May 2021
TL;DR: The authors propose K-Adapter, which remains the original parameters of the pre-trained model fixed and supports continual knowledge infusion, and injects factual knowledge obtained from automatically aligned text-triplets on Wikipedia and Wikidata, and linguistic knowledge from dependency parsing.
Abstract: We study the problem of injecting knowledge into large pre-trained models like BERT and RoBERTa. Existing methods typically update the original parameters of pre-trained models when injecting knowledge. However, when multiple kinds of knowledge are injected, they may suffer from catastrophic forgetting. To address this, we propose K-Adapter, which remains the original parameters of the pre-trained model fixed and supports continual knowledge infusion. Taking RoBERTa as the pre-trained model, K-Adapter has a neural adapter for each kind of infused knowledge, like a plug-in connected to RoBERTa. There is no information flow between different adapters, thus different adapters are efficiently trained in a distributed way. We inject two kinds of knowledge, including factual knowledge obtained from automatically aligned text-triplets on Wikipedia and Wikidata, and linguistic knowledge obtained from dependency parsing. Results on three knowledge-driven tasks (total six datasets) including relation classification, entity typing and question answering demonstrate that each adapter improves the performance, and the combination of both adapters brings further improvements. Probing experiments further indicate that K-Adapter captures richer factual and commonsense knowledge than RoBERTa.

51 citations

Proceedings ArticleDOI
10 May 2021
TL;DR: This model first encodes a dialogue context and a slot with a pre-trained self-attentive encoder, and generates slot value in auto-regressive manner, and incorporates Slot Type Informed Descriptions that capture the shared information of different slots to facilitates the cross-domain knowledge transfer.
Abstract: Zero-shot cross-domain dialogue state tracking (DST) enables us to handle unseen domains without the expense of collecting in-domain data. In this paper, we propose a slot descriptions enhanced generative approach for zero-shot cross-domain DST. Specifically, our model first encodes a dialogue context and a slot with a pre-trained self-attentive encoder, and generates slot value in auto-regressive manner. In addition, we incorporate Slot Type Informed Descriptions that capture the shared information of different slots to facilitates the cross-domain knowledge transfer. Experimental results on MultiWOZ shows that our model significantly improve existing state-of-the-art results in zero-shot cross-domain setting.

51 citations

Journal ArticleDOI
TL;DR: This survey provides a comprehensive typology of word representation models from a novel perspective that the development from static to dynamic embeddings can effectively address the polysemy problem.
Abstract: In the history of natural language processing (NLP) development, the representation of words has always been a significant research topic. In this survey, we provide a comprehensive typology of word representation models from a novel perspective that the development from static to dynamic embeddings can effectively address the polysemy problem, which has been a great challenge in this field. Then the survey covers the main evaluation metrics and applications of these word embeddings. And, we further discuss the development of word embeddings from static to dynamic in cross-lingual scenario. Finally, we point out some open issues and future works.

51 citations

Proceedings ArticleDOI
01 Dec 2020
TL;DR: This work provides a new schema of the pipeline of the generation process by classifying it into five modules, and presents an overview of different techniques used to perform the modulation of these modules.
Abstract: Neural controllable text generation is an important area gaining attention due to its plethora of applications. Although there is a large body of prior work in controllable text generation, there is no unifying theme. In this work, we provide a new schema of the pipeline of the generation process by classifying it into five modules. The control of attributes in the generation process requires modification of these modules. We present an overview of different techniques used to perform the modulation of these modules. We also provide an analysis on the advantages and disadvantages of these techniques. We further pave ways to develop new architectures based on the combination of the modules described in this paper.

50 citations

Proceedings ArticleDOI
01 Jul 2020
TL;DR: ClAMS (Cross-Linguistic Assessment of Models on Syntax), a syntactic evaluation suite for monolingual and multilingual models, is introduced, which uses subject-verb agreement challenge sets for English, French, German, Hebrew and Russian, generated from grammars developed.
Abstract: A range of studies have concluded that neural word prediction models can distinguish grammatical from ungrammatical sentences with high accuracy. However, these studies are based primarily on monolingual evidence from English. To investigate how these models' ability to learn syntax varies by language, we introduce CLAMS (Cross-Linguistic Assessment of Models on Syntax), a syntactic evaluation suite for monolingual and multilingual models. CLAMS includes subject-verb agreement challenge sets for English, French, German, Hebrew and Russian, generated from grammars we develop. We use CLAMS to evaluate LSTM language models as well as monolingual and multilingual BERT. Across languages, monolingual LSTMs achieved high accuracy on dependencies without attractors, and generally poor accuracy on agreement across object relative clauses. On other constructions, agreement accuracy was generally higher in languages with richer morphology. Multilingual models generally underperformed monolingual models. Multilingual BERT showed high syntactic accuracy on English, but noticeable deficiencies in other languages.

50 citations

Trending Questions (1)
What are the limitations of transfer learning with a unified text-to-text transformer?

The paper does not mention the limitations of transfer learning with a unified text-to-text transformer.