scispace - formally typeset
Search or ask a question
Journal Article

Exploring the Limits of Transfer Learning with a Unified Text-to-Text Transformer

TL;DR: This article introduced a unified framework that converts all text-based language problems into a text-to-text format and compared pre-training objectives, architectures, unlabeled data sets, transfer approaches, and other factors on dozens of language understanding tasks.
Abstract: Transfer learning, where a model is first pre-trained on a data-rich task before being fine-tuned on a downstream task, has emerged as a powerful technique in natural language processing (NLP). The effectiveness of transfer learning has given rise to a diversity of approaches, methodology, and practice. In this paper, we explore the landscape of transfer learning techniques for NLP by introducing a unified framework that converts all text-based language problems into a text-to-text format. Our systematic study compares pre-training objectives, architectures, unlabeled data sets, transfer approaches, and other factors on dozens of language understanding tasks. By combining the insights from our exploration with scale and our new ``Colossal Clean Crawled Corpus'', we achieve state-of-the-art results on many benchmarks covering summarization, question answering, text classification, and more. To facilitate future work on transfer learning for NLP, we release our data set, pre-trained models, and code.

Content maybe subject to copyright    Report

Citations
More filters
Posted Content
TL;DR: This article propose an analysis framework that links the pretraining and downstream tasks with an underlying latent variable generative model of text, and analyze head tuning (learning a classifier on top of the frozen pretrained model) and prompt tuning in this setting.
Abstract: Pretrained language models have achieved state-of-the-art performance when adapted to a downstream NLP task. However, theoretical analysis of these models is scarce and challenging since the pretraining and downstream tasks can be very different. We propose an analysis framework that links the pretraining and downstream tasks with an underlying latent variable generative model of text -- the downstream classifier must recover a function of the posterior distribution over the latent variables. We analyze head tuning (learning a classifier on top of the frozen pretrained model) and prompt tuning in this setting. The generative model in our analysis is either a Hidden Markov Model (HMM) or an HMM augmented with a latent memory component, motivated by long-term dependencies in natural language. We show that 1) under certain non-degeneracy conditions on the HMM, simple classification heads can solve the downstream task, 2) prompt tuning obtains downstream guarantees with weaker non-degeneracy conditions, and 3) our recovery guarantees for the memory-augmented HMM are stronger than for the vanilla HMM because task-relevant information is easier to recover from the long-term memory. Experiments on synthetically generated data from HMMs back our theoretical findings.

7 citations

Proceedings ArticleDOI
01 Aug 2021

7 citations

Proceedings ArticleDOI
01 Aug 2021
TL;DR: In this paper, a weakly-supervised approach is proposed to train models with synthetic Q-A pairs generated procedurally from captions, which can be trained without any human-annotated pairs, but with images and their associated textual descriptions or captions.
Abstract: Methodologies for training visual question answering (VQA) models assume the availability of datasets with human-annotated \textit{Image-Question-Answer} (I-Q-A) triplets. This has led to heavy reliance on datasets and a lack of generalization to new types of questions and scenes. Linguistic priors along with biases and errors due to annotator subjectivity have been shown to percolate into VQA models trained on such samples. We study whether models can be trained without any human-annotated Q-A pairs, but only with images and their associated textual descriptions or captions. We present a method to train models with synthetic Q-A pairs generated procedurally from captions. Additionally, we demonstrate the efficacy of spatial-pyramid image patches as a simple but effective alternative to dense and costly object bounding box annotations used in existing VQA models. Our experiments on three VQA benchmarks demonstrate the efficacy of this weakly-supervised approach, especially on the VQA-CP challenge, which tests performance under changing linguistic priors.

7 citations

Proceedings ArticleDOI
01 Dec 2020
TL;DR: An automatic question generation approach to partially automate QUD annotation by generating all potentially relevant questions for a given sentence by employing the transformation rules to generate a large set of sentence-question-answer triples and train a neural question generation model on them to obtain both systematic question type coverage and robustness.
Abstract: Questions under Discussion (QUD; Roberts, 2012) are emerging as a conceptually fruitful approach to spelling out the connection between the information structure of a sentence and the nature of the discourse in which the sentence can function. To make this approach useful for analyzing authentic data, Riester, Brunetti & De Kuthy (2018) presented a discourse annotation framework based on explicit pragmatic principles for determining a QUD for every assertion in a text. De Kuthy et al. (2018) demonstrate that this supports more reliable discourse structure annotation, and Ziai and Meurers (2018) show that based on explicit questions, automatic focus annotation becomes feasible. But both approaches are based on manually specified questions. In this paper, we present an automatic question generation approach to partially automate QUD annotation by generating all potentially relevant questions for a given sentence. While transformation rules can concisely capture the typical question formation process, a rule-based approach is not sufficiently robust for authentic data. We therefore employ the transformation rules to generate a large set of sentence-question-answer triples and train a neural question generation model on them to obtain both systematic question type coverage and robustness.

7 citations

Posted Content
TL;DR: This paper introduced CLAMS (Cross-Linguistic Assessment of Models on Syntax), a syntactic evaluation suite for monolingual and multilingual models, which includes subject-verb agreement challenge sets for English, French, German, Hebrew and Russian, generated from grammars they developed.
Abstract: A range of studies have concluded that neural word prediction models can distinguish grammatical from ungrammatical sentences with high accuracy. However, these studies are based primarily on monolingual evidence from English. To investigate how these models' ability to learn syntax varies by language, we introduce CLAMS (Cross-Linguistic Assessment of Models on Syntax), a syntactic evaluation suite for monolingual and multilingual models. CLAMS includes subject-verb agreement challenge sets for English, French, German, Hebrew and Russian, generated from grammars we develop. We use CLAMS to evaluate LSTM language models as well as monolingual and multilingual BERT. Across languages, monolingual LSTMs achieved high accuracy on dependencies without attractors, and generally poor accuracy on agreement across object relative clauses. On other constructions, agreement accuracy was generally higher in languages with richer morphology. Multilingual models generally underperformed monolingual models. Multilingual BERT showed high syntactic accuracy on English, but noticeable deficiencies in other languages.

7 citations

Trending Questions (1)
What are the limitations of transfer learning with a unified text-to-text transformer?

The paper does not mention the limitations of transfer learning with a unified text-to-text transformer.