Exploring the Limits of Transfer Learning with a Unified Text-to-Text Transformer

Home
/
Papers
/
Exploring the Limits of Transfer Learning with a Unified Text-to-Text Transformer

Journal Article•

Exploring the Limits of Transfer Learning with a Unified Text-to-Text Transformer

Colin Raffel, Noam Shazeer, Adam Roberts, Katherine Lee, Sharan Narang, Michael Matena, Yanqi Zhou, Wei Li, Peter J. Liu - Show less +5 more

01 Jan 2020-Journal of Machine Learning Research-Vol. 21, Iss: 140, pp 1-67

TL;DR: This article introduced a unified framework that converts all text-based language problems into a text-to-text format and compared pre-training objectives, architectures, unlabeled data sets, transfer approaches, and other factors on dozens of language understanding tasks.

read less

Abstract: Transfer learning, where a model is first pre-trained on a data-rich task before being fine-tuned on a downstream task, has emerged as a powerful technique in natural language processing (NLP). The effectiveness of transfer learning has given rise to a diversity of approaches, methodology, and practice. In this paper, we explore the landscape of transfer learning techniques for NLP by introducing a unified framework that converts all text-based language problems into a text-to-text format. Our systematic study compares pre-training objectives, architectures, unlabeled data sets, transfer approaches, and other factors on dozens of language understanding tasks. By combining the insights from our exploration with scale and our new ``Colossal Clean Crawled Corpus'', we achieve state-of-the-art results on many benchmarks covering summarization, question answering, text classification, and more. To facilitate future work on transfer learning for NLP, we release our data set, pre-trained models, and code.

...read moreread less

Content maybe subject to copyright Report

Citations

PDF

Open Access

More filters

Posted Content•

Prompt-Learning for Fine-Grained Entity Typing.

[...]

Ning Ding, Yulin Chen, Xu Han, Guangwei Xu, Pengjun Xie, Hai-Tao Zheng, Zhiyuan Liu, Juanzi Li¹, Hong-Gee Kim - Show less +5 more•Institutions (1)

Tsinghua University¹

24 Aug 2021-arXiv: Computation and Language

TL;DR: This paper investigated the application of prompt-learning on fine-grained entity typing in fully-supervised, few-shot and zero-shot scenarios, and showed that prompt learning methods significantly outperform fine-tuning baselines, especially when the training data was insufficient.

...read moreread less

Abstract: As an effective approach to tune pre-trained language models (PLMs) for specific tasks, prompt-learning has recently attracted much attention from researchers. By using \textit{cloze}-style language prompts to stimulate the versatile knowledge of PLMs, prompt-learning can achieve promising results on a series of NLP tasks, such as natural language inference, sentiment classification, and knowledge probing. In this work, we investigate the application of prompt-learning on fine-grained entity typing in fully supervised, few-shot and zero-shot scenarios. We first develop a simple and effective prompt-learning pipeline by constructing entity-oriented verbalizers and templates and conducting masked language modeling. Further, to tackle the zero-shot regime, we propose a self-supervised strategy that carries out distribution-level optimization in prompt-learning to automatically summarize the information of entity types. Extensive experiments on three fine-grained entity typing benchmarks (with up to 86 classes) under fully supervised, few-shot and zero-shot settings show that prompt-learning methods significantly outperform fine-tuning baselines, especially when the training data is insufficient.

...read moreread less

9 citations

Posted Content•

NeuroLogic Decoding: (Un)supervised Neural Text Generation with Predicate Logic Constraints

[...]

Ximing Lu¹, Peter West², Rowan Zellers², Ronan Le Bras¹, Chandra Bhagavatula¹, Yejin Choi² - Show less +2 more•Institutions (2)

Allen Institute for Artificial Intelligence¹, University of Washington²

24 Oct 2020-arXiv: Computation and Language

TL;DR: NeuroLogic Decoding as mentioned in this paper is a simple yet effective algorithm that enables neural language models to generate fluent text while satisfying complex lexical constraints, and it is shown that unsupervised models with NeuroLogic decoding often outperform supervised models with conventional decoding, even when the latter is based on considerably larger networks.

...read moreread less

Abstract: Conditional text generation often requires lexical constraints, i.e., which words should or shouldn't be included in the output text. While the dominant recipe for conditional text generation has been large-scale pretrained language models that are finetuned on the task-specific training data, such models do not learn to follow the underlying constraints reliably, even when supervised with large amounts of task-specific examples. We propose NeuroLogic Decoding, a simple yet effective algorithm that enables neural language models -- supervised or not -- to generate fluent text while satisfying complex lexical constraints. Our approach is powerful yet efficient. It handles any set of lexical constraints that is expressible under predicate logic, while its asymptotic runtime is equivalent to conventional beam search. Empirical results on four benchmarks show that NeuroLogic Decoding outperforms previous approaches, including algorithms that handle a subset of our constraints. Moreover, we find that unsupervised models with NeuroLogic Decoding often outperform supervised models with conventional decoding, even when the latter is based on considerably larger networks. Our results suggest the limit of large-scale neural networks for fine-grained controllable generation and the promise of inference-time algorithms.

...read moreread less

9 citations

Posted Content•

Neural Databases

[...]

James Thorne, Majid Yazdani, Marzieh Saeidi, Fabrizio Silvestri, Sebastian Riedel, Alon Halevy - Show less +2 more

14 Oct 2020-arXiv: Computation and Language

TL;DR: This paper describes NeuralDB, a database system with no pre-defined schema, in which updates and queries are given in natural language, and describes an algorithm that learns how to create the appropriate sets of facts to be fed into each of the Neural SPJ operators.

...read moreread less

Abstract: In recent years, neural networks have shown impressive performance gains on long-standing AI problems, and in particular, answering queries from natural language text. These advances raise the question of whether they can be extended to a point where we can relax the fundamental assumption of database management, namely, that our data is represented as fields of a pre-defined schema. This paper presents a first step in answering that question. We describe NeuralDB, a database system with no pre-defined schema, in which updates and queries are given in natural language. We develop query processing techniques that build on the primitives offered by the state of the art Natural Language Processing methods. We begin by demonstrating that at the core, recent NLP transformers, powered by pre-trained language models, can answer select-project-join queries if they are given the exact set of relevant facts. However, they cannot scale to non-trivial databases and cannot perform aggregation queries. Based on these findings, we describe a NeuralDB architecture that runs multiple Neural SPJ operators in parallel, each with a set of database sentences that can produce one of the answers to the query. The result of these operators is fed to an aggregation operator if needed. We describe an algorithm that learns how to create the appropriate sets of facts to be fed into each of the Neural SPJ operators. Importantly, this algorithm can be trained by the Neural SPJ operator itself. We experimentally validate the accuracy of NeuralDB and its components, showing that we can answer queries over thousands of sentences with very high accuracy.

...read moreread less

9 citations

Proceedings Article•DOI•

PARADE: A New Dataset for Paraphrase Identification Requiring Computer Science Domain Knowledge

[...]

Yun He¹, Zhuoer Wang¹, Yin Zhang¹, Ruihong Huang¹, James Caverlee¹ - Show less +1 more•Institutions (1)

Texas A&M University¹

01 Nov 2020

TL;DR: A new benchmark dataset called PARADE for paraphrase identification that requires specialized domain knowledge that contains paraphrases that overlap very little at the lexical and syntactic level but are semantically equivalent based on computer science domain knowledge.

...read moreread less

Abstract: We present a new benchmark dataset called PARADE for paraphrase identification that requires specialized domain knowledge. PARADE contains paraphrases that overlap very little at the lexical and syntactic level but are semantically equivalent based on computer science domain knowledge, as well as non-paraphrases that overlap greatly at the lexical and syntactic level but are not semantically equivalent based on this domain knowledge. Experiments show that both state-of-the-art neural models and non-expert human annotators have poor performance on PARADE. For example, BERT after fine-tuning achieves an F1 score of 0.709, which is much lower than its performance on other paraphrase identification datasets. PARADE can serve as a resource for researchers interested in testing models that incorporate domain knowledge. We make our data and code freely available.

...read moreread less

9 citations

Proceedings Article•DOI•

JointGT: Graph-Text Joint Representation Learning for Text Generation from Knowledge Graphs

[...]

Pei Ke¹, Haozhe Ji¹, Yu Ran, Xin Cui, Liwei Wang², Linfeng Song³, Xiaoyan Zhu¹, Minlie Huang¹ - Show less +4 more•Institutions (3)

Tsinghua University¹, Peking University², Tencent³

01 Aug 2021

TL;DR: JointGT as mentioned in this paper proposes a graph-text joint representation learning model for knowledge graph-to-text (KG-totext) generation, which uses a structure-aware semantic aggregation module to preserve the graph structure.

...read moreread less

Abstract: Existing pre-trained models for knowledge-graph-to-text (KG-to-text) generation simply fine-tune text-to-text pre-trained models such as BART or T5 on KG-to-text datasets, which largely ignore the graph structure during encoding and lack elaborate pre-training tasks to explicitly model graph-text alignments. To tackle these problems, we propose a graph-text joint representation learning model called JointGT. During encoding, we devise a structure-aware semantic aggregation module which is plugged into each Transformer layer to preserve the graph structure. Furthermore, we propose three new pre-training tasks to explicitly enhance the graph-text alignment including respective text / graph reconstruction, and graph-text alignment in the embedding space via Optimal Transport. Experiments show that JointGT obtains new state-of-the-art performance on various KG-to-text datasets.

...read moreread less

9 citations