Exploring the Limits of Transfer Learning with a Unified Text-to-Text Transformer

Home
/
Papers
/
Exploring the Limits of Transfer Learning with a Unified Text-to-Text Transformer

Journal Article•

Exploring the Limits of Transfer Learning with a Unified Text-to-Text Transformer

Colin Raffel, Noam Shazeer, Adam Roberts, Katherine Lee, Sharan Narang, Michael Matena, Yanqi Zhou, Wei Li, Peter J. Liu - Show less +5 more

01 Jan 2020-Journal of Machine Learning Research-Vol. 21, Iss: 140, pp 1-67

TL;DR: This article introduced a unified framework that converts all text-based language problems into a text-to-text format and compared pre-training objectives, architectures, unlabeled data sets, transfer approaches, and other factors on dozens of language understanding tasks.

read less

Abstract: Transfer learning, where a model is first pre-trained on a data-rich task before being fine-tuned on a downstream task, has emerged as a powerful technique in natural language processing (NLP). The effectiveness of transfer learning has given rise to a diversity of approaches, methodology, and practice. In this paper, we explore the landscape of transfer learning techniques for NLP by introducing a unified framework that converts all text-based language problems into a text-to-text format. Our systematic study compares pre-training objectives, architectures, unlabeled data sets, transfer approaches, and other factors on dozens of language understanding tasks. By combining the insights from our exploration with scale and our new ``Colossal Clean Crawled Corpus'', we achieve state-of-the-art results on many benchmarks covering summarization, question answering, text classification, and more. To facilitate future work on transfer learning for NLP, we release our data set, pre-trained models, and code.

...read moreread less

Content maybe subject to copyright Report

Citations

PDF

Open Access

More filters

Posted Content•

From Paraphrasing to Semantic Parsing: Unsupervised Semantic Parsing via Synchronous Semantic Decoding

[...]

Shan Wu¹, Bo Chen¹, Chunlei Xin¹, Xianpei Han¹, Le Sun¹, Weipeng Zhang, Jiansong Chen, Fan Yang, Xunliang Cai - Show less +5 more•Institutions (1)

Chinese Academy of Sciences¹

11 Jun 2021-arXiv: Computation and Language

TL;DR: In this paper, an unsupervised semantic parsing method called Synchronous Semantic Decoding (SSD) is proposed, which can simultaneously resolve the semantic gap and the structure gap by jointly leveraging paraphrasing and grammar constrained decoding.

...read moreread less

Abstract: Semantic parsing is challenging due to the structure gap and the semantic gap between utterances and logical forms. In this paper, we propose an unsupervised semantic parsing method - Synchronous Semantic Decoding (SSD), which can simultaneously resolve the semantic gap and the structure gap by jointly leveraging paraphrasing and grammar constrained decoding. Specifically, we reformulate semantic parsing as a constrained paraphrasing problem: given an utterance, our model synchronously generates its canonical utterance and meaning representation. During synchronous decoding: the utterance paraphrasing is constrained by the structure of the logical form, therefore the canonical utterance can be paraphrased controlledly; the semantic decoding is guided by the semantics of the canonical utterance, therefore its logical form can be generated unsupervisedly. Experimental results show that SSD is a promising approach and can achieve competitive unsupervised semantic parsing performance on multiple datasets.

...read moreread less

Book Chapter•DOI•

From Shortsighted to Bird View: Jointly Capturing All Aspects for Question-Answering Style Aspect-Based Sentiment Analysis

[...]

Liang Zhao, Bingfeng Luo, Bai Zuo, Xi Yin, Kunfeng Lai, Jianping Shen - Show less +2 more

18 Nov 2020

TL;DR: Li et al. as mentioned in this paper proposed to use the global-perspective (GP) question to replace the original question in QA-style ABSA, which explicitly tells the model the existence of other relevant aspects using additional instructions.

...read moreread less

Abstract: Aspect-based sentiment analysis (ABSA) aims to identify the opinion polarity towards a specific aspect. Traditional approaches formulate ABSA as a sentence classification task. However, it is observed that the single sentence classification paradigm cannot take full advantage of pre-trained language models. Previous work suggests it is better to cast ABSA as a question answering (QA) task for each aspect, which can be solved in the sentence-pair classification paradigm. Though QA-style ABSA achieves state-of-the-art (SOTA) results, it naturally separates the prediction process of multiple aspects belonging to the same sentence. It thus is unable to take full advantage of the correlation between different aspects. In this paper, we propose to use the global-perspective (GP) question to replace the original question in QA-style ABSA, which explicitly tells the model the existence of other relevant aspects using additional instructions. In this way, the model can distinguish relevant phrases for each aspect better and utilize the underlying relationship between different aspects. The experimental results on three benchmark ABSA datasets demonstrate the effectiveness of our method.

...read moreread less

Journal Article•DOI•

Compositional Generalization via Parsing Tree Annotation

[...]

Segwang Kim¹, Joonyoung Kim¹, Kyomin Jung¹•Institutions (1)

Seoul National University¹

29 Jan 2021-IEEE Access

TL;DR: In this paper, a data augmentation technique using paring trees is proposed to annotate targets by inserting a new delimiter token in between them according to their parsing trees for the training stage, the technique needs prior knowledge about the targets' semantic or syntactic compositionality.

...read moreread less

Abstract: Humans can understand a novel sentence by parsing it into known components like phrases and clauses To achieve human-level artificial intelligence, compositional generalization tasks are suggested and used to assess machine learning models Among those tasks, the SCAN tasks are challenging for the standard deep learning models, such as RNN sequence-to-sequence models and Transformers, that show great success across many natural language processing tasks Even though a long line of deep learning research has developed memory augmented neural networks aimed at the SCAN tasks, their generalities remain questionable for more complex and realistic applications where the standard seq2seq models dominate Hence, one needs to propose a method that helps the standard models to discover compositional rules To this end, we propose a data augmentation technique using paring trees Our technique annotates targets by inserting a new delimiter token in between them according to their parsing trees For the training stage, the technique needs prior knowledge about the targets’ semantic or syntactic compositionality On the other hand, for the test stage, the technique uses no such knowledge Experiments show that our technique enables the standard models to achieve compositional generalization on the SCAN tasks Furthermore, we validate our technique on a synthetic task and confirm the standard models’ strong performance gains without using prior knowledge about semantic compositionality As one way to infuse parsing tree information into sequences, our technique can be used for tasks with structured targets like program code generation tasks

...read moreread less

Posted Content•

Towards a Unified View of Parameter-Efficient Transfer Learning

[...]

Junxian He¹, Chunting Zhou¹, Xuezhe Ma², Taylor Berg-Kirkpatrick³, Graham Neubig¹ - Show less +1 more•Institutions (3)

Carnegie Mellon University¹, Information Sciences Institute², University of California, San Diego³

08 Oct 2021-arXiv: Computation and Language

TL;DR: In this article, a unified framework for parameter-efficient transfer learning methods is presented, which enables the transfer of design elements across different approaches, and as a result enables the instantiate new parameterefficient fine-tuning methods that tune less parameters than previous methods.

...read moreread less

Abstract: Fine-tuning large pre-trained language models on downstream tasks has become the de-facto learning paradigm in NLP. However, conventional approaches fine-tune all the parameters of the pre-trained model, which becomes prohibitive as the model size and the number of tasks grow. Recent work has proposed a variety of parameter-efficient transfer learning methods that only fine-tune a small number of (extra) parameters to attain strong performance. While effective, the critical ingredients for success and the connections among the various methods are poorly understood. In this paper, we break down the design of state-of-the-art parameter-efficient transfer learning methods and present a unified framework that establishes connections between them. Specifically, we re-frame them as modifications to specific hidden states in pre-trained models, and define a set of design dimensions along which different methods vary, such as the function to compute the modification and the position to apply the modification. Through comprehensive empirical studies across machine translation, text summarization, language understanding, and text classification benchmarks, we utilize the unified view to identify important design choices in previous methods. Furthermore, our unified framework enables the transfer of design elements across different approaches, and as a result we are able to instantiate new parameter-efficient fine-tuning methods that tune less parameters than previous methods while being more effective, achieving comparable results to fine-tuning all parameters on all four tasks.

...read moreread less

Posted Content•

Investigating Numeracy Learning Ability of a Text-to-Text Transfer Model

[...]

Kuntal Kumar Pal¹, Chitta Baral¹•Institutions (1)

Arizona State University¹

10 Sep 2021-arXiv: Computation and Language

TL;DR: This article investigated the ability of text-to-text transfer learning model (T5) to learn numeracy, and found that T5 models perform reasonably well in the interpolation setting, but they struggle considerably in the extrapolation setting across all four numeracy tasks.

...read moreread less

Abstract: The transformer-based pre-trained language models have been tremendously successful in most of the conventional NLP tasks. But they often struggle in those tasks where numerical understanding is required. Some possible reasons can be the tokenizers and pre-training objectives which are not specifically designed to learn and preserve numeracy. Here we investigate the ability of text-to-text transfer learning model (T5), which has outperformed its predecessors in the conventional NLP tasks, to learn numeracy. We consider four numeracy tasks: numeration, magnitude order prediction, finding minimum and maximum in a series, and sorting. We find that, although T5 models perform reasonably well in the interpolation setting, they struggle considerably in the extrapolation setting across all four tasks.

...read moreread less