Exploring the Limits of Transfer Learning with a Unified Text-to-Text Transformer

Home
/
Papers
/
Exploring the Limits of Transfer Learning with a Unified Text-to-Text Transformer

Journal Article•

Exploring the Limits of Transfer Learning with a Unified Text-to-Text Transformer

Colin Raffel, Noam Shazeer, Adam Roberts, Katherine Lee, Sharan Narang, Michael Matena, Yanqi Zhou, Wei Li, Peter J. Liu - Show less +5 more

01 Jan 2020-Journal of Machine Learning Research-Vol. 21, Iss: 140, pp 1-67

TL;DR: This article introduced a unified framework that converts all text-based language problems into a text-to-text format and compared pre-training objectives, architectures, unlabeled data sets, transfer approaches, and other factors on dozens of language understanding tasks.

read less

Abstract: Transfer learning, where a model is first pre-trained on a data-rich task before being fine-tuned on a downstream task, has emerged as a powerful technique in natural language processing (NLP). The effectiveness of transfer learning has given rise to a diversity of approaches, methodology, and practice. In this paper, we explore the landscape of transfer learning techniques for NLP by introducing a unified framework that converts all text-based language problems into a text-to-text format. Our systematic study compares pre-training objectives, architectures, unlabeled data sets, transfer approaches, and other factors on dozens of language understanding tasks. By combining the insights from our exploration with scale and our new ``Colossal Clean Crawled Corpus'', we achieve state-of-the-art results on many benchmarks covering summarization, question answering, text classification, and more. To facilitate future work on transfer learning for NLP, we release our data set, pre-trained models, and code.

...read moreread less

Content maybe subject to copyright Report

Citations

PDF

Open Access

More filters

Proceedings Article•

Structure-aware Fine-tuning of Sequence-to-sequence Transformers for Transition-based AMR Parsing

[...]

Jiawei Zhou¹, Tahira Naseem², Ramón Fernandez Astudillo², Young-Suk Lee², Radu Florian², Salim Roukos² - Show less +2 more•Institutions (2)

Harvard University¹, IBM²

29 Oct 2021

TL;DR: This article proposed a transition-aware pre-trained sequence-to-sequence Transformer parser for AMR parsing, which is based on a simplified transition set, designed to better exploit pretrained language models for structured fine-tuning.

...read moreread less

Abstract: Predicting linearized Abstract Meaning Representation (AMR) graphs using pre-trained sequence-to-sequence Transformer models has recently led to large improvements on AMR parsing benchmarks. These parsers are simple and avoid explicit modeling of structure but lack desirable properties such as graph well-formedness guarantees or built-in graph-sentence alignments. In this work we explore the integration of general pre-trained sequence-to-sequence language models and a structure-aware transition-based approach. We depart from a pointer-based transition system and propose a simplified transition set, designed to better exploit pre-trained language models for structured fine-tuning. We also explore modeling the parser state within the pre-trained encoder-decoder architecture and different vocabulary strategies for the same purpose. We provide a detailed comparison with recent progress in AMR parsing and show that the proposed parser retains the desirable properties of previous transition-based approaches, while being simpler and reaching the new parsing state of the art for AMR 2.0, without the need for graph re-categorization.

...read moreread less

2 citations

Proceedings Article•DOI•

Cydex: Neural Search Infrastructure for the Scholarly Literature

[...]

Shane Ding, Edwin Zhang¹, Jimmy Lin¹•Institutions (1)

University of Waterloo¹

01 Nov 2020

TL;DR: By decoupling corpus-specific configurations from the frontend implementation, this paper is able to demonstrate the generality of Cydex on two very different corpora: the ACL Anthology and a collection of hydrology abstracts.

...read moreread less

Abstract: Cydex is a platform that provides neural search infrastructure for domain-specific scholarly literature. The platform represents an abstraction of Covidex, our recently developed full-stack open-source search engine for the COVID-19 Open Research Dataset (CORD-19) from AI2. While Covidex takes advantage of the latest best practices for keyword search using the popular Lucene search library as well as state-of-the-art neural ranking models using T5, parts of the system were hard coded to only work with CORD-19. This paper describes our efforts to generalize Covidex into Cydex, which can be applied to scholarly literature in different domains. By decoupling corpus-specific configurations from the frontend implementation, we are able to demonstrate the generality of Cydex on two very different corpora: the ACL Anthology and a collection of hydrology abstracts. Our platform is entirely open source and available at cydex.ai.

...read moreread less

2 citations

Proceedings Article•DOI•

Rewire-then-Probe: A Contrastive Recipe for Probing Biomedical Knowledge of Pre-trained Language Models

[...]

01 Jan 2022

TL;DR: Zaiqiao Meng, Fangyu Liu, Ehsan Shareghi, Yixuan Su, Charlotte Collins, Nigel Collier as mentioned in this paper have presented a paper on the 60th Annual Meeting of the Association for Computational Linguistics.

...read moreread less

Abstract: Zaiqiao Meng, Fangyu Liu, Ehsan Shareghi, Yixuan Su, Charlotte Collins, Nigel Collier. Proceedings of the 60th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers). 2022.

...read moreread less

2 citations

Proceedings Article•DOI•

LongSumm 2021: Session based automatic summarization model for scientific document

[...]

Senci Ying, Zheng Yan Zhao, Wuhe Zou

01 Jun 2021

TL;DR: This paper proposes a session based automatic summarization model (SBAS) which using a session and ensemble mechanism to generate long summary and achieves the best performance in the LongSumm task.

...read moreread less

Abstract: Most summarization task focuses on generating relatively short summaries. Such a length constraint might not be appropriate when summarizing scientific work. The LongSumm task needs participants generate long summary for scientific document. This task usual can be solved by language model. But an important problem is that model like BERT is limit to memory, and can not deal with a long input like a document. Also generate a long output is hard. In this paper, we propose a session based automatic summarization model(SBAS) which using a session and ensemble mechanism to generate long summary. And our model achieves the best performance in the LongSumm task.

...read moreread less

2 citations

Proceedings Article•DOI•

Ethical-Advice Taker: Do Language Models Understand Natural Language Interventions?

[...]

Jieyu Zhao¹, Daniel Khashabi¹, Tushar Khot², Ashish Sabharwal², Kai-Wei Chang² - Show less +1 more•Institutions (2)

University of California, Los Angeles¹, Allen Institute for Artificial Intelligence²

01 Aug 2021

TL;DR: This article proposed a new language understanding task, Linguistic Ethical Interventions (LEI), where the goal is to amend a question-answering model's unethical behavior by communicating context-specific principles of ethics and equity to it.

...read moreread less

Abstract: Is it possible to use natural language to intervene in a model's behavior and alter its prediction in a desired way? We investigate the effectiveness of natural language interventions for reading-comprehension systems, studying this in the context of social stereotypes. Specifically, we propose a new language understanding task, Linguistic Ethical Interventions (LEI), where the goal is to amend a question-answering (QA) model's unethical behavior by communicating context-specific principles of ethics and equity to it. To this end, we build upon recent methods for quantifying a system's social stereotypes, augmenting them with different kinds of ethical interventions and the desired model behavior under such interventions. Our zero-shot evaluation finds that even today's powerful neural language models are extremely poor ethical-advice takers, that is, they respond surprisingly little to ethical interventions even though these interventions are stated as simple sentences. Few-shot learning improves model behavior but remains far from the desired outcome, especially when evaluated for various types of generalization. Our new task thus poses a novel language understanding challenge for the community.

...read moreread less

2 citations