Exploring the Limits of Transfer Learning with a Unified Text-to-Text Transformer

Home
/
Papers
/
Exploring the Limits of Transfer Learning with a Unified Text-to-Text Transformer

Journal Article•

Exploring the Limits of Transfer Learning with a Unified Text-to-Text Transformer

Colin Raffel, Noam Shazeer, Adam Roberts, Katherine Lee, Sharan Narang, Michael Matena, Yanqi Zhou, Wei Li, Peter J. Liu - Show less +5 more

01 Jan 2020-Journal of Machine Learning Research-Vol. 21, Iss: 140, pp 1-67

TL;DR: This article introduced a unified framework that converts all text-based language problems into a text-to-text format and compared pre-training objectives, architectures, unlabeled data sets, transfer approaches, and other factors on dozens of language understanding tasks.

read less

Abstract: Transfer learning, where a model is first pre-trained on a data-rich task before being fine-tuned on a downstream task, has emerged as a powerful technique in natural language processing (NLP). The effectiveness of transfer learning has given rise to a diversity of approaches, methodology, and practice. In this paper, we explore the landscape of transfer learning techniques for NLP by introducing a unified framework that converts all text-based language problems into a text-to-text format. Our systematic study compares pre-training objectives, architectures, unlabeled data sets, transfer approaches, and other factors on dozens of language understanding tasks. By combining the insights from our exploration with scale and our new ``Colossal Clean Crawled Corpus'', we achieve state-of-the-art results on many benchmarks covering summarization, question answering, text classification, and more. To facilitate future work on transfer learning for NLP, we release our data set, pre-trained models, and code.

...read moreread less

Content maybe subject to copyright Report

Citations

PDF

Open Access

More filters

Posted Content•

Explaining Natural Language Processing Classifiers with Occlusion and Language Modeling

[...]

David Harbecke

28 Jan 2021-arXiv: Computation and Language

TL;DR: This article proposed an explanation method, called OLM, which combines occlusion and language modeling, which are techniques central to explainability and NLP, respectively, for natural language processing classifiers.

...read moreread less

Abstract: Deep neural networks are powerful statistical learners. However, their predictions do not come with an explanation of their process. To analyze these models, explanation methods are being developed. We present a novel explanation method, called OLM, for natural language processing classifiers. This method combines occlusion and language modeling, which are techniques central to explainability and NLP, respectively. OLM gives explanations that are theoretically sound and easy to understand. We make several contributions to the theory of explanation methods. Axioms for explanation methods are an interesting theoretical concept to explore their basics and deduce methods. We introduce a new axiom, give its intuition and show it contradicts another existing axiom. Additionally, we point out theoretical difficulties of existing gradient-based and some occlusion-based explanation methods in natural language processing. We provide an extensive argument why evaluation of explanation methods is difficult. We compare OLM to other explanation methods and underline its uniqueness experimentally. Finally, we investigate corner cases of OLM and discuss its validity and possible improvements.

...read moreread less

1 citations

Comparison of the effects of attention mechanism on translation tasks of different lengths of ambiguous words

[...]

Yue Hu, Jiahao Qin, Zemeiqi Chen, Jingshi Zhou, Xiaojun Zhang - Show less +1 more

01 Dec 2020

TL;DR: This paper investigated the influence of context marker on attention mechanism in word sense disambiguation task and found that the alignment effect of attention mechanism is magnified in short text translation tasks with ambiguous nouns.

...read moreread less

Abstract: In recent years, attention mechanism has been widely used in various neural machine translation tasks based on encoder decoder. This paper focuses on the performance of encoder decoder attention mechanism in word sense disambiguation task with different text length, trying to find out the influence of context marker on attention mechanism in word sense disambiguation task. We hypothesize that attention mechanisms have similar performance when translating texts of different lengths. Our conclusion is that the alignment effect of attention mechanism is magnified in short text translation tasks with ambiguous nouns, while the effect of attention mechanism is far less than expected in long-text tasks, which means that attention mechanism is not the main mechanism for NMT model to feed WSD to integrate context information. This may mean that attention mechanism pays more attention to ambiguous nouns than context markers. The experimental results show that with the increase of text length, the performance of NMT model using attention mechanism will gradually decline.

...read moreread less

1 citations

Proceedings Article•DOI•

Do Transformers Dream of Inference, or Can Pretrained Generative Models Learn Implicit Inferential Rules?

[...]

Zhengzhong Liang, Mihai Surdeanu

01 Nov 2020

TL;DR: This work investigates the capability of a state-of-the-art transformer LM to generate explicit inference hops, i.e., to infer a new statement necessary to answer a question given some premise input statements.

...read moreread less

Abstract: Large pretrained language models (LM) have been used successfully for multi-hop question answering. However, most of these directions are not interpretable, as they do not make the inference hops necessary to explain a candidate answer explicitly. In this work, we investigate the capability of a state-of-the-art transformer LM to generate explicit inference hops, i.e., to infer a new statement necessary to answer a question given some premise input statements. Our analysis shows that such LMs can generate new statements for some simple inference types, but performance remains poor for complex, real-world inference types such as those that require monotonicity, composition, and commonsense knowledge.

...read moreread less

1 citations

Posted Content•

Tiered Reasoning for Intuitive Physics: Toward Verifiable Commonsense Language Understanding.

[...]

Shane Storks¹, Qiaozi Gao², Yichi Zhang, Joyce Y. Chai¹•Institutions (2)

University of Michigan¹, Michigan State University²

10 Sep 2021-arXiv: Computation and Language

TL;DR: The authors introduce the TRIP dataset, which is a commonsense reasoning dataset with dense annotations that enable multi-tiered evaluation of machines' reasoning process, showing that large LMs can achieve high end performance, but they struggle to support their predictions with valid supporting evidence.

...read moreread less

Abstract: Large-scale, pre-trained language models (LMs) have achieved human-level performance on a breadth of language understanding tasks. However, evaluations only based on end task performance shed little light on machines' true ability in language understanding and reasoning. In this paper, we highlight the importance of evaluating the underlying reasoning process in addition to end performance. Toward this goal, we introduce Tiered Reasoning for Intuitive Physics (TRIP), a novel commonsense reasoning dataset with dense annotations that enable multi-tiered evaluation of machines' reasoning process. Our empirical results show that while large LMs can achieve high end performance, they struggle to support their predictions with valid supporting evidence. The TRIP dataset and our baseline results will motivate verifiable evaluation of commonsense reasoning and facilitate future research toward developing better language understanding and reasoning models.

...read moreread less

1 citations

Proceedings Article•DOI•

Automatic Document Sketching: Generating Drafts from Analogous Texts

[...]

Zeqiu Wu¹, Michel Galley², Chris Brockett², Yizhe Zhang², Bill Dolan² - Show less +1 more•Institutions (2)

University of Washington¹, Microsoft²

01 Aug 2021

TL;DR: The authors introduce a new task, document sketching, which involves generating entire draft documents for the writer to review and revise, and investigate the application of weakly supervised methods including use of a transformer-based mixture of experts, together with reinforcement learning.

...read moreread less

Abstract: The advent of large pre-trained language models has made it possible to make high-quality predictions on how to add or change a sentence in a document. However, the high branching factor inherent to text generation impedes the ability of even the strongest language models to offer useful editing suggestions at a more global or document level. We introduce a new task, document sketching, which involves generating entire draft documents for the writer to review and revise. These drafts are built from sets of documents that overlap in form - sharing large segments of potentially reusable text - while diverging in content. To support this task, we introduce a Wikipedia-based dataset of analogous documents and investigate the application of weakly supervised methods, including use of a transformer-based mixture of experts, together with reinforcement learning. We report experiments using automated and human evaluation methods and discuss relative merits of these models.

...read moreread less

1 citations