scispace - formally typeset
Search or ask a question
Journal Article

Exploring the Limits of Transfer Learning with a Unified Text-to-Text Transformer

TL;DR: This article introduced a unified framework that converts all text-based language problems into a text-to-text format and compared pre-training objectives, architectures, unlabeled data sets, transfer approaches, and other factors on dozens of language understanding tasks.
Abstract: Transfer learning, where a model is first pre-trained on a data-rich task before being fine-tuned on a downstream task, has emerged as a powerful technique in natural language processing (NLP). The effectiveness of transfer learning has given rise to a diversity of approaches, methodology, and practice. In this paper, we explore the landscape of transfer learning techniques for NLP by introducing a unified framework that converts all text-based language problems into a text-to-text format. Our systematic study compares pre-training objectives, architectures, unlabeled data sets, transfer approaches, and other factors on dozens of language understanding tasks. By combining the insights from our exploration with scale and our new ``Colossal Clean Crawled Corpus'', we achieve state-of-the-art results on many benchmarks covering summarization, question answering, text classification, and more. To facilitate future work on transfer learning for NLP, we release our data set, pre-trained models, and code.

Content maybe subject to copyright    Report

Citations
More filters
Proceedings Article
29 Oct 2021
TL;DR: This article proposed a transition-aware pre-trained sequence-to-sequence Transformer parser for AMR parsing, which is based on a simplified transition set, designed to better exploit pretrained language models for structured fine-tuning.
Abstract: Predicting linearized Abstract Meaning Representation (AMR) graphs using pre-trained sequence-to-sequence Transformer models has recently led to large improvements on AMR parsing benchmarks. These parsers are simple and avoid explicit modeling of structure but lack desirable properties such as graph well-formedness guarantees or built-in graph-sentence alignments. In this work we explore the integration of general pre-trained sequence-to-sequence language models and a structure-aware transition-based approach. We depart from a pointer-based transition system and propose a simplified transition set, designed to better exploit pre-trained language models for structured fine-tuning. We also explore modeling the parser state within the pre-trained encoder-decoder architecture and different vocabulary strategies for the same purpose. We provide a detailed comparison with recent progress in AMR parsing and show that the proposed parser retains the desirable properties of previous transition-based approaches, while being simpler and reaching the new parsing state of the art for AMR 2.0, without the need for graph re-categorization.

2 citations

Proceedings ArticleDOI
01 Nov 2020
TL;DR: By decoupling corpus-specific configurations from the frontend implementation, this paper is able to demonstrate the generality of Cydex on two very different corpora: the ACL Anthology and a collection of hydrology abstracts.
Abstract: Cydex is a platform that provides neural search infrastructure for domain-specific scholarly literature. The platform represents an abstraction of Covidex, our recently developed full-stack open-source search engine for the COVID-19 Open Research Dataset (CORD-19) from AI2. While Covidex takes advantage of the latest best practices for keyword search using the popular Lucene search library as well as state-of-the-art neural ranking models using T5, parts of the system were hard coded to only work with CORD-19. This paper describes our efforts to generalize Covidex into Cydex, which can be applied to scholarly literature in different domains. By decoupling corpus-specific configurations from the frontend implementation, we are able to demonstrate the generality of Cydex on two very different corpora: the ACL Anthology and a collection of hydrology abstracts. Our platform is entirely open source and available at cydex.ai.

2 citations

Proceedings ArticleDOI
01 Jan 2022
TL;DR: Zaiqiao Meng, Fangyu Liu, Ehsan Shareghi, Yixuan Su, Charlotte Collins, Nigel Collier as mentioned in this paper have presented a paper on the 60th Annual Meeting of the Association for Computational Linguistics.
Abstract: Zaiqiao Meng, Fangyu Liu, Ehsan Shareghi, Yixuan Su, Charlotte Collins, Nigel Collier. Proceedings of the 60th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers). 2022.

2 citations

Proceedings ArticleDOI
01 Jun 2021
TL;DR: This paper proposes a session based automatic summarization model (SBAS) which using a session and ensemble mechanism to generate long summary and achieves the best performance in the LongSumm task.
Abstract: Most summarization task focuses on generating relatively short summaries. Such a length constraint might not be appropriate when summarizing scientific work. The LongSumm task needs participants generate long summary for scientific document. This task usual can be solved by language model. But an important problem is that model like BERT is limit to memory, and can not deal with a long input like a document. Also generate a long output is hard. In this paper, we propose a session based automatic summarization model(SBAS) which using a session and ensemble mechanism to generate long summary. And our model achieves the best performance in the LongSumm task.

2 citations

Proceedings ArticleDOI
01 Aug 2021
TL;DR: This article proposed a new language understanding task, Linguistic Ethical Interventions (LEI), where the goal is to amend a question-answering model's unethical behavior by communicating context-specific principles of ethics and equity to it.
Abstract: Is it possible to use natural language to intervene in a model's behavior and alter its prediction in a desired way? We investigate the effectiveness of natural language interventions for reading-comprehension systems, studying this in the context of social stereotypes. Specifically, we propose a new language understanding task, Linguistic Ethical Interventions (LEI), where the goal is to amend a question-answering (QA) model's unethical behavior by communicating context-specific principles of ethics and equity to it. To this end, we build upon recent methods for quantifying a system's social stereotypes, augmenting them with different kinds of ethical interventions and the desired model behavior under such interventions. Our zero-shot evaluation finds that even today's powerful neural language models are extremely poor ethical-advice takers, that is, they respond surprisingly little to ethical interventions even though these interventions are stated as simple sentences. Few-shot learning improves model behavior but remains far from the desired outcome, especially when evaluated for various types of generalization. Our new task thus poses a novel language understanding challenge for the community.

2 citations

Trending Questions (1)
What are the limitations of transfer learning with a unified text-to-text transformer?

The paper does not mention the limitations of transfer learning with a unified text-to-text transformer.