Exploring the Limits of Transfer Learning with a Unified Text-to-Text Transformer

Home
/
Papers
/
Exploring the Limits of Transfer Learning with a Unified Text-to-Text Transformer

Journal Article•

Exploring the Limits of Transfer Learning with a Unified Text-to-Text Transformer

Colin Raffel, Noam Shazeer, Adam Roberts, Katherine Lee, Sharan Narang, Michael Matena, Yanqi Zhou, Wei Li, Peter J. Liu - Show less +5 more

01 Jan 2020-Journal of Machine Learning Research-Vol. 21, Iss: 140, pp 1-67

TL;DR: This article introduced a unified framework that converts all text-based language problems into a text-to-text format and compared pre-training objectives, architectures, unlabeled data sets, transfer approaches, and other factors on dozens of language understanding tasks.

read less

Abstract: Transfer learning, where a model is first pre-trained on a data-rich task before being fine-tuned on a downstream task, has emerged as a powerful technique in natural language processing (NLP). The effectiveness of transfer learning has given rise to a diversity of approaches, methodology, and practice. In this paper, we explore the landscape of transfer learning techniques for NLP by introducing a unified framework that converts all text-based language problems into a text-to-text format. Our systematic study compares pre-training objectives, architectures, unlabeled data sets, transfer approaches, and other factors on dozens of language understanding tasks. By combining the insights from our exploration with scale and our new ``Colossal Clean Crawled Corpus'', we achieve state-of-the-art results on many benchmarks covering summarization, question answering, text classification, and more. To facilitate future work on transfer learning for NLP, we release our data set, pre-trained models, and code.

...read moreread less

Content maybe subject to copyright Report

Citations

PDF

Open Access

More filters

Journal Article•DOI•

Braid: Weaving Symbolic and Neural Knowledge into Coherent Logical Explanations

[...]

28 Jun 2022-Proceedings of the ... AAAI Conference on Artificial Intelligence

TL;DR: BRABB as mentioned in this paper is a logical reasoner that supports probabilistic rules and uses the notion of custom unification functions and dynamic rule generation to overcome the brittle matching and knowledge gap problem prevalent in traditional reasoners.

...read moreread less

Abstract: Traditional symbolic reasoning engines, while attractive for their precision and explicability, have a few major drawbacks: the use of brittle inference procedures that rely on exact matching (unification) of logical terms, an inability to deal with uncertainty, and the need for a precompiled rule-base of knowledge (the “knowledge acquisition” problem). To address these issues, we devise a novel logical reasoner called Braid, that supports probabilistic rules, and uses the notion of custom unification functions and dynamic rule generation to overcome the brittle matching and knowledge-gap problem prevalent in traditional reasoners. In this paper, we describe the reasoning algorithms used in Braid, and their implementation in a distributed task-based framework that builds proof/explanation graphs for an input query. We use a simple QA example from a children’s story to motivate Braid’s design and explain how the various components work together to produce a coherent logical explanation. Finally, we evaluate Braid on the ROC Story Cloze test and achieve close to state-of-the-art results while providing frame-based explanations.

...read moreread less

2 citations

Posted Content•

proScript: Partially Ordered Scripts Generation via Pre-trained Language Models.

[...]

Keisuke Sakaguchi, Chandra Bhagavatula, Ronan Le Bras, Niket Tandon, Peter Clark, Yejin Choi¹ - Show less +2 more•Institutions (1)

Allen Institute for Artificial Intelligence¹

16 Apr 2021-arXiv: Computation and Language

TL;DR: The authors used pre-trained neural language models (LMs) to generate high-quality scripts, at varying levels of granularity, for a wide range of everyday scenarios (e.g., bake a cake).

...read moreread less

Abstract: Scripts - standardized event sequences describing typical everyday activities - have been shown to help understand narratives by providing expectations, resolving ambiguity, and filling in unstated information. However, to date they have proved hard to author or extract from text. In this work, we demonstrate for the first time that pre-trained neural language models (LMs) can be be finetuned to generate high-quality scripts, at varying levels of granularity, for a wide range of everyday scenarios (e.g., bake a cake). To do this, we collected a large (6.4k), crowdsourced partially ordered scripts (named proScript), which is substantially larger than prior datasets, and developed models that generate scripts with combining language generation and structure prediction. We define two complementary tasks: (i) edge prediction: given a scenario and unordered events, organize the events into a valid (possibly partial-order) script, and (ii) script generation: given only a scenario, generate events and organize them into a (possibly partial-order) script. Our experiments show that our models perform well (e.g., F1=75.7 in task (i)), illustrating a new approach to overcoming previous barriers to script collection. We also show that there is still significant room for improvement toward human level performance. Together, our tasks, dataset, and models offer a new research direction for learning script knowledge.

...read moreread less

2 citations

Proceedings Article•DOI•

Can I be of further assistance? Using unstructured knowledge access to improve task-oriented conversational modeling

[...]

Di Jin, Seokhwan Kim, Dilek Hakkani-Tur

01 Aug 2021

TL;DR: In this article, a pipelined approach is proposed for knowledge-seeking turn detection, knowledge selection, and response generation in sequence, which achieves state-of-the-art performance on the DSTC9 Track 1 benchmark dataset.

...read moreread less

Abstract: Most prior work on task-oriented dialogue systems are restricted to limited coverage of domain APIs. However, users oftentimes have requests that are out of the scope of these APIs. This work focuses on responding to these beyond-API-coverage user turns by incorporating external, unstructured knowledge sources. Our approach works in a pipelined manner with knowledge-seeking turn detection, knowledge selection, and response generation in sequence. We introduce novel data augmentation methods for the first two steps and demonstrate that the use of information extracted from dialogue context improves the knowledge selection and end-to-end performances. Through experiments, we achieve state-of-the-art performance for both automatic and human evaluation metrics on the DSTC9 Track 1 benchmark dataset, validating the effectiveness of our contributions.

...read moreread less

2 citations

Proceedings Article•

Aspect-Controllable Opinion Summarization.

[...]

Reinald Kim Amplayo¹, Stefanos Angelidis¹, Mirella Lapata¹•Institutions (1)

University of Edinburgh¹

01 Nov 2021

TL;DR: This paper proposed an approach that allows the generation of customized summaries based on aspect queries (e.g., describing the location and room of a hotel) using a review corpus, and created a synthetic training dataset of (review, summary) pairs enriched with aspect controllers which are induced by a multi-instance learning model that predicts the aspects of a document at different levels of granularity.

...read moreread less

Abstract: Recent work on opinion summarization produces general summaries based on a set of input reviews and the popularity of opinions expressed in them. In this paper, we propose an approach that allows the generation of customized summaries based on aspect queries (e.g., describing the location and room of a hotel). Using a review corpus, we create a synthetic training dataset of (review, summary) pairs enriched with aspect controllers which are induced by a multi-instance learning model that predicts the aspects of a document at different levels of granularity. We fine-tune a pretrained model using our synthetic dataset and generate aspect-specific summaries by modifying the aspect controllers. Experiments on two benchmarks show that our model outperforms the previous state of the art and generates personalized summaries by controlling the number of aspects discussed in them.

...read moreread less

2 citations