Exploring the Limits of Transfer Learning with a Unified Text-to-Text Transformer

Home
/
Papers
/
Exploring the Limits of Transfer Learning with a Unified Text-to-Text Transformer

Journal Article•

Exploring the Limits of Transfer Learning with a Unified Text-to-Text Transformer

Colin Raffel, Noam Shazeer, Adam Roberts, Katherine Lee, Sharan Narang, Michael Matena, Yanqi Zhou, Wei Li, Peter J. Liu - Show less +5 more

01 Jan 2020-Journal of Machine Learning Research-Vol. 21, Iss: 140, pp 1-67

TL;DR: This article introduced a unified framework that converts all text-based language problems into a text-to-text format and compared pre-training objectives, architectures, unlabeled data sets, transfer approaches, and other factors on dozens of language understanding tasks.

read less

Abstract: Transfer learning, where a model is first pre-trained on a data-rich task before being fine-tuned on a downstream task, has emerged as a powerful technique in natural language processing (NLP). The effectiveness of transfer learning has given rise to a diversity of approaches, methodology, and practice. In this paper, we explore the landscape of transfer learning techniques for NLP by introducing a unified framework that converts all text-based language problems into a text-to-text format. Our systematic study compares pre-training objectives, architectures, unlabeled data sets, transfer approaches, and other factors on dozens of language understanding tasks. By combining the insights from our exploration with scale and our new ``Colossal Clean Crawled Corpus'', we achieve state-of-the-art results on many benchmarks covering summarization, question answering, text classification, and more. To facilitate future work on transfer learning for NLP, we release our data set, pre-trained models, and code.

...read moreread less

Content maybe subject to copyright Report

Citations

PDF

Open Access

More filters

Book Chapter•DOI•

Using Transformer based Ensemble Learning to classify Scientific Articles

[...]

Sohom Ghosh¹, Ankush Chopra¹•Institutions (1)

Fidelity Investments¹

19 Feb 2021-arXiv: Computation and Language

TL;DR: In the shared task of SDPRA-2021, the authors of FideLIPI as mentioned in this paper described a system which comprises four independent sub-systems capable of classifying abstracts of scientific literature to one of the given seven classes.

...read moreread less

Abstract: Many time reviewers fail to appreciate novel ideas of a researcher and provide generic feedback. Thus, proper assignment of reviewers based on their area of expertise is necessary. Moreover, reading each and every paper from end-to-end for assigning it to a reviewer is a tedious task. In this paper, we describe a system which our team FideLIPI submitted in the shared task of SDPRA-2021 [14]. It comprises four independent sub-systems capable of classifying abstracts of scientific literature to one of the given seven classes. The first one is a RoBERTa [10] based model built over these abstracts. Adding topic models / Latent dirichlet allocation (LDA) [2] based features to the first model results in the second sub-system. The third one is a sentence level RoBERTa [10] model. The fourth one is a Logistic Regression model built using Term Frequency Inverse Document Frequency (TF-IDF) features. We ensemble predictions of these four sub-systems using majority voting to develop the final system which gives a F1 score of 0.93 on the test and validation set. This outperforms the existing State Of The Art (SOTA) model SciBERT's [1] in terms of F1 score on the validation set.Our codebase is available at this https URL

...read moreread less

3 citations

Proceedings Article•

Towards a Task-Agnostic Model of Difficulty Estimation for Supervised Learning Tasks

[...]

Antonio Laverghetta¹, Jamshidbek Mirzakhalov¹, John Licato¹•Institutions (1)

University of South Florida¹

01 Dec 2020

TL;DR: This work explores developing a task-agnostic model for problem difficulty and applying it to the Stanford Natural Language Inference dataset, using the human responses that come with the dev set of SNLI to create the curriculum.

...read moreread less

Abstract: Curriculum learning, a training strategy where training data are ordered based on their difficulty, has been shown to improve performance and reduce training time on various NLP tasks. While much work over the years has developed novel approaches for generating curricula, these strategies are typically only suited for the task they were designed for. This work explores developing a task-agnostic model for problem difficulty and applying it to the Stanford Natural Language Inference (SNLI) dataset. Using the human responses that come with the dev set of SNLI, we train both regression and classification models to predict how many annotators will answer a question correctly and then project the difficulty estimates onto the full SNLI train set to create the curriculum. We argue that our curriculum is effectively capturing difficulty for this task through various analyses of both the model and the predicted difficulty scores.

...read moreread less

3 citations

Posted Content•

NAREOR: The Narrative Reordering Problem.

[...]

Varun Gangal¹, Steven Y. Feng¹, Eduard Hovy², Teruko Mitamura¹•Institutions (2)

Carnegie Mellon University¹, University of Pittsburgh²

14 Apr 2021-arXiv: Computation and Language

TL;DR: This article proposed the task of Narrative Reordering (NAREOR), which involves rewriting a given story in a different narrative order while preserving its plot, semantic, and temporal aspects, and presented a dataset with over 1000 human rewritings of stories within ROCStories in non-linear orders.

...read moreread less

Abstract: We propose the task of Narrative Reordering(NAREOR) which involves rewriting a given story in a different narrative order while preserving its plot, semantic, and temporal aspects. We present a dataset, NAREORC, with over 1000 human rewritings of stories within ROCStories in non-linear orders, and conduct a detailed analysis of it. Further, we propose novel initial task-specific training methods and evaluation metrics. We perform experiments on NAREORC using GPT-2 and Transformer models and conduct an extensive human evaluation. We demonstrate that NAREOR is a challenging task with potential for further exploration.

...read moreread less

3 citations

DOI•

Recognizing Artificial Intelligence: The Key to Unlocking Human AI Teams

[...]

Patrick Cummings, Nathan Schurr, Andrew M. Naber, Charlie, Daniel Serfaty - Show less +1 more

01 Jan 2021

TL;DR: In this paper, the authors describe an artificially intelligent coworker, named Charlie, who participated in a panel discussion and then advanced to speak during multiple podcast interviews, contribute to a rap battle, catalyze a brainstorming workshop, and even write collaboratively.

...read moreread less

Abstract: This chapter covers work and corresponding insights gained while building an artificially intelligent coworker, named Charlie. Over the past year, Charlie first participated in a panel discussion and then advanced to speak during multiple podcast interviews, contribute to a rap battle, catalyze a brainstorming workshop, and even write collaboratively (see the author list above). To explore the concepts and overcome the challenges when engineering human–AI teams, Charlie was built on cutting-edge language models, strong sense of embodiment, deep learning speech synthesis, and powerful visuals. However, the real differentiator in our approach is that of recognizing artificial intelligence (AI). The act of “recognizing” Charlie can be seen when we give her a voice and expect her to be heard, in a way that shows we acknowledge and appreciate her contributions; and when our repeated interactions create a comfortable awareness between her and her teammates. In this chapter, we present our approach to recognizing AI, discussing our goals, and describe how we developed Charlie’s capabilities. We also present some initial results from an innovative brainstorming workshop in which Charlie participated with four humans that showed that she could not only participate in a brainstorming exercise but also contribute and influence the brainstorming discussion covering a space of ideas. Furthermore, Charlie helped us formulate ideas for, and even wrote sections of, this chapter.

...read moreread less

3 citations

Proceedings Article•DOI•

KACE: Generating Knowledge Aware Contrastive Explanations for Natural Language Inference

[...]

Qianglong Chen¹, Feng Ji¹, Xiangji Zeng, Feng-Lin Li¹, Ji Zhang¹, Haiqing Chen¹, Yin Zhang² - Show less +3 more•Institutions (2)

Alibaba Group¹, Zhejiang University²

01 Aug 2021

TL;DR: This article proposed a knowledge-aware contrastive explanation generation framework (KACE) to generate contrastive explanations with counterfactual examples in NLI and achieved an accuracy of 91.9% on SNLI, achieving improvements of 5.7% against ETPA and 0.6% against NILE.

...read moreread less

Abstract: In order to better understand the reason behind model behaviors (i.e., making predictions), most recent works have exploited generative models to provide complementary explanations. However, existing approaches in NLP mainly focus on “WHY A” rather than contrastive “WHY A NOT B”, which is shown to be able to better distinguish confusing candidates and improve data efficiency in other research fields.In this paper, we focus on generating contrastive explanations with counterfactual examples in NLI and propose a novel Knowledge-Aware Contrastive Explanation generation framework (KACE).Specifically, we first identify rationales (i.e., key phrases) from input sentences, and use them as key perturbations for generating counterfactual examples. After obtaining qualified counterfactual examples, we take them along with original examples and external knowledge as input, and employ a knowledge-aware generative pre-trained language model to generate contrastive explanations. Experimental results show that contrastive explanations are beneficial to fit the scenarios by clarifying the difference between the predicted answer and other possible wrong ones. Moreover, we train an NLI model enhanced with contrastive explanations and achieves an accuracy of 91.9% on SNLI, gaining improvements of 5.7% against ETPA (“Explain-Then-Predict-Attention”) and 0.6% against NILE (“WHY A”).

...read moreread less

3 citations