scispace - formally typeset
Search or ask a question
Journal Article

Exploring the Limits of Transfer Learning with a Unified Text-to-Text Transformer

TL;DR: This article introduced a unified framework that converts all text-based language problems into a text-to-text format and compared pre-training objectives, architectures, unlabeled data sets, transfer approaches, and other factors on dozens of language understanding tasks.
Abstract: Transfer learning, where a model is first pre-trained on a data-rich task before being fine-tuned on a downstream task, has emerged as a powerful technique in natural language processing (NLP). The effectiveness of transfer learning has given rise to a diversity of approaches, methodology, and practice. In this paper, we explore the landscape of transfer learning techniques for NLP by introducing a unified framework that converts all text-based language problems into a text-to-text format. Our systematic study compares pre-training objectives, architectures, unlabeled data sets, transfer approaches, and other factors on dozens of language understanding tasks. By combining the insights from our exploration with scale and our new ``Colossal Clean Crawled Corpus'', we achieve state-of-the-art results on many benchmarks covering summarization, question answering, text classification, and more. To facilitate future work on transfer learning for NLP, we release our data set, pre-trained models, and code.

Content maybe subject to copyright    Report

Citations
More filters
Posted Content
TL;DR: In this paper, an unsupervised semantic parsing method called Synchronous Semantic Decoding (SSD) is proposed, which can simultaneously resolve the semantic gap and the structure gap by jointly leveraging paraphrasing and grammar constrained decoding.
Abstract: Semantic parsing is challenging due to the structure gap and the semantic gap between utterances and logical forms. In this paper, we propose an unsupervised semantic parsing method - Synchronous Semantic Decoding (SSD), which can simultaneously resolve the semantic gap and the structure gap by jointly leveraging paraphrasing and grammar constrained decoding. Specifically, we reformulate semantic parsing as a constrained paraphrasing problem: given an utterance, our model synchronously generates its canonical utterance and meaning representation. During synchronous decoding: the utterance paraphrasing is constrained by the structure of the logical form, therefore the canonical utterance can be paraphrased controlledly; the semantic decoding is guided by the semantics of the canonical utterance, therefore its logical form can be generated unsupervisedly. Experimental results show that SSD is a promising approach and can achieve competitive unsupervised semantic parsing performance on multiple datasets.
Book ChapterDOI
18 Nov 2020
TL;DR: Li et al. as mentioned in this paper proposed to use the global-perspective (GP) question to replace the original question in QA-style ABSA, which explicitly tells the model the existence of other relevant aspects using additional instructions.
Abstract: Aspect-based sentiment analysis (ABSA) aims to identify the opinion polarity towards a specific aspect. Traditional approaches formulate ABSA as a sentence classification task. However, it is observed that the single sentence classification paradigm cannot take full advantage of pre-trained language models. Previous work suggests it is better to cast ABSA as a question answering (QA) task for each aspect, which can be solved in the sentence-pair classification paradigm. Though QA-style ABSA achieves state-of-the-art (SOTA) results, it naturally separates the prediction process of multiple aspects belonging to the same sentence. It thus is unable to take full advantage of the correlation between different aspects. In this paper, we propose to use the global-perspective (GP) question to replace the original question in QA-style ABSA, which explicitly tells the model the existence of other relevant aspects using additional instructions. In this way, the model can distinguish relevant phrases for each aspect better and utilize the underlying relationship between different aspects. The experimental results on three benchmark ABSA datasets demonstrate the effectiveness of our method.
Journal ArticleDOI
TL;DR: In this paper, a data augmentation technique using paring trees is proposed to annotate targets by inserting a new delimiter token in between them according to their parsing trees for the training stage, the technique needs prior knowledge about the targets' semantic or syntactic compositionality.
Abstract: Humans can understand a novel sentence by parsing it into known components like phrases and clauses To achieve human-level artificial intelligence, compositional generalization tasks are suggested and used to assess machine learning models Among those tasks, the SCAN tasks are challenging for the standard deep learning models, such as RNN sequence-to-sequence models and Transformers, that show great success across many natural language processing tasks Even though a long line of deep learning research has developed memory augmented neural networks aimed at the SCAN tasks, their generalities remain questionable for more complex and realistic applications where the standard seq2seq models dominate Hence, one needs to propose a method that helps the standard models to discover compositional rules To this end, we propose a data augmentation technique using paring trees Our technique annotates targets by inserting a new delimiter token in between them according to their parsing trees For the training stage, the technique needs prior knowledge about the targets’ semantic or syntactic compositionality On the other hand, for the test stage, the technique uses no such knowledge Experiments show that our technique enables the standard models to achieve compositional generalization on the SCAN tasks Furthermore, we validate our technique on a synthetic task and confirm the standard models’ strong performance gains without using prior knowledge about semantic compositionality As one way to infuse parsing tree information into sequences, our technique can be used for tasks with structured targets like program code generation tasks
Posted Content
TL;DR: In this article, a unified framework for parameter-efficient transfer learning methods is presented, which enables the transfer of design elements across different approaches, and as a result enables the instantiate new parameterefficient fine-tuning methods that tune less parameters than previous methods.
Abstract: Fine-tuning large pre-trained language models on downstream tasks has become the de-facto learning paradigm in NLP. However, conventional approaches fine-tune all the parameters of the pre-trained model, which becomes prohibitive as the model size and the number of tasks grow. Recent work has proposed a variety of parameter-efficient transfer learning methods that only fine-tune a small number of (extra) parameters to attain strong performance. While effective, the critical ingredients for success and the connections among the various methods are poorly understood. In this paper, we break down the design of state-of-the-art parameter-efficient transfer learning methods and present a unified framework that establishes connections between them. Specifically, we re-frame them as modifications to specific hidden states in pre-trained models, and define a set of design dimensions along which different methods vary, such as the function to compute the modification and the position to apply the modification. Through comprehensive empirical studies across machine translation, text summarization, language understanding, and text classification benchmarks, we utilize the unified view to identify important design choices in previous methods. Furthermore, our unified framework enables the transfer of design elements across different approaches, and as a result we are able to instantiate new parameter-efficient fine-tuning methods that tune less parameters than previous methods while being more effective, achieving comparable results to fine-tuning all parameters on all four tasks.
Posted Content
TL;DR: This article investigated the ability of text-to-text transfer learning model (T5) to learn numeracy, and found that T5 models perform reasonably well in the interpolation setting, but they struggle considerably in the extrapolation setting across all four numeracy tasks.
Abstract: The transformer-based pre-trained language models have been tremendously successful in most of the conventional NLP tasks. But they often struggle in those tasks where numerical understanding is required. Some possible reasons can be the tokenizers and pre-training objectives which are not specifically designed to learn and preserve numeracy. Here we investigate the ability of text-to-text transfer learning model (T5), which has outperformed its predecessors in the conventional NLP tasks, to learn numeracy. We consider four numeracy tasks: numeration, magnitude order prediction, finding minimum and maximum in a series, and sorting. We find that, although T5 models perform reasonably well in the interpolation setting, they struggle considerably in the extrapolation setting across all four tasks.
Trending Questions (1)
What are the limitations of transfer learning with a unified text-to-text transformer?

The paper does not mention the limitations of transfer learning with a unified text-to-text transformer.