scispace - formally typeset
Search or ask a question
Journal Article

Exploring the Limits of Transfer Learning with a Unified Text-to-Text Transformer

TL;DR: This article introduced a unified framework that converts all text-based language problems into a text-to-text format and compared pre-training objectives, architectures, unlabeled data sets, transfer approaches, and other factors on dozens of language understanding tasks.
Abstract: Transfer learning, where a model is first pre-trained on a data-rich task before being fine-tuned on a downstream task, has emerged as a powerful technique in natural language processing (NLP). The effectiveness of transfer learning has given rise to a diversity of approaches, methodology, and practice. In this paper, we explore the landscape of transfer learning techniques for NLP by introducing a unified framework that converts all text-based language problems into a text-to-text format. Our systematic study compares pre-training objectives, architectures, unlabeled data sets, transfer approaches, and other factors on dozens of language understanding tasks. By combining the insights from our exploration with scale and our new ``Colossal Clean Crawled Corpus'', we achieve state-of-the-art results on many benchmarks covering summarization, question answering, text classification, and more. To facilitate future work on transfer learning for NLP, we release our data set, pre-trained models, and code.

Content maybe subject to copyright    Report

Citations
More filters
Book ChapterDOI
TL;DR: In the shared task of SDPRA-2021, the authors of FideLIPI as mentioned in this paper described a system which comprises four independent sub-systems capable of classifying abstracts of scientific literature to one of the given seven classes.
Abstract: Many time reviewers fail to appreciate novel ideas of a researcher and provide generic feedback. Thus, proper assignment of reviewers based on their area of expertise is necessary. Moreover, reading each and every paper from end-to-end for assigning it to a reviewer is a tedious task. In this paper, we describe a system which our team FideLIPI submitted in the shared task of SDPRA-2021 [14]. It comprises four independent sub-systems capable of classifying abstracts of scientific literature to one of the given seven classes. The first one is a RoBERTa [10] based model built over these abstracts. Adding topic models / Latent dirichlet allocation (LDA) [2] based features to the first model results in the second sub-system. The third one is a sentence level RoBERTa [10] model. The fourth one is a Logistic Regression model built using Term Frequency Inverse Document Frequency (TF-IDF) features. We ensemble predictions of these four sub-systems using majority voting to develop the final system which gives a F1 score of 0.93 on the test and validation set. This outperforms the existing State Of The Art (SOTA) model SciBERT's [1] in terms of F1 score on the validation set.Our codebase is available at this https URL

3 citations

Proceedings Article
01 Dec 2020
TL;DR: This work explores developing a task-agnostic model for problem difficulty and applying it to the Stanford Natural Language Inference dataset, using the human responses that come with the dev set of SNLI to create the curriculum.
Abstract: Curriculum learning, a training strategy where training data are ordered based on their difficulty, has been shown to improve performance and reduce training time on various NLP tasks. While much work over the years has developed novel approaches for generating curricula, these strategies are typically only suited for the task they were designed for. This work explores developing a task-agnostic model for problem difficulty and applying it to the Stanford Natural Language Inference (SNLI) dataset. Using the human responses that come with the dev set of SNLI, we train both regression and classification models to predict how many annotators will answer a question correctly and then project the difficulty estimates onto the full SNLI train set to create the curriculum. We argue that our curriculum is effectively capturing difficulty for this task through various analyses of both the model and the predicted difficulty scores.

3 citations

Posted Content
TL;DR: This article proposed the task of Narrative Reordering (NAREOR), which involves rewriting a given story in a different narrative order while preserving its plot, semantic, and temporal aspects, and presented a dataset with over 1000 human rewritings of stories within ROCStories in non-linear orders.
Abstract: We propose the task of Narrative Reordering(NAREOR) which involves rewriting a given story in a different narrative order while preserving its plot, semantic, and temporal aspects. We present a dataset, NAREORC, with over 1000 human rewritings of stories within ROCStories in non-linear orders, and conduct a detailed analysis of it. Further, we propose novel initial task-specific training methods and evaluation metrics. We perform experiments on NAREORC using GPT-2 and Transformer models and conduct an extensive human evaluation. We demonstrate that NAREOR is a challenging task with potential for further exploration.

3 citations

DOI
01 Jan 2021
TL;DR: In this paper, the authors describe an artificially intelligent coworker, named Charlie, who participated in a panel discussion and then advanced to speak during multiple podcast interviews, contribute to a rap battle, catalyze a brainstorming workshop, and even write collaboratively.
Abstract: This chapter covers work and corresponding insights gained while building an artificially intelligent coworker, named Charlie. Over the past year, Charlie first participated in a panel discussion and then advanced to speak during multiple podcast interviews, contribute to a rap battle, catalyze a brainstorming workshop, and even write collaboratively (see the author list above). To explore the concepts and overcome the challenges when engineering human–AI teams, Charlie was built on cutting-edge language models, strong sense of embodiment, deep learning speech synthesis, and powerful visuals. However, the real differentiator in our approach is that of recognizing artificial intelligence (AI). The act of “recognizing” Charlie can be seen when we give her a voice and expect her to be heard, in a way that shows we acknowledge and appreciate her contributions; and when our repeated interactions create a comfortable awareness between her and her teammates. In this chapter, we present our approach to recognizing AI, discussing our goals, and describe how we developed Charlie’s capabilities. We also present some initial results from an innovative brainstorming workshop in which Charlie participated with four humans that showed that she could not only participate in a brainstorming exercise but also contribute and influence the brainstorming discussion covering a space of ideas. Furthermore, Charlie helped us formulate ideas for, and even wrote sections of, this chapter.

3 citations

Proceedings ArticleDOI
01 Aug 2021
TL;DR: This article proposed a knowledge-aware contrastive explanation generation framework (KACE) to generate contrastive explanations with counterfactual examples in NLI and achieved an accuracy of 91.9% on SNLI, achieving improvements of 5.7% against ETPA and 0.6% against NILE.
Abstract: In order to better understand the reason behind model behaviors (i.e., making predictions), most recent works have exploited generative models to provide complementary explanations. However, existing approaches in NLP mainly focus on “WHY A” rather than contrastive “WHY A NOT B”, which is shown to be able to better distinguish confusing candidates and improve data efficiency in other research fields.In this paper, we focus on generating contrastive explanations with counterfactual examples in NLI and propose a novel Knowledge-Aware Contrastive Explanation generation framework (KACE).Specifically, we first identify rationales (i.e., key phrases) from input sentences, and use them as key perturbations for generating counterfactual examples. After obtaining qualified counterfactual examples, we take them along with original examples and external knowledge as input, and employ a knowledge-aware generative pre-trained language model to generate contrastive explanations. Experimental results show that contrastive explanations are beneficial to fit the scenarios by clarifying the difference between the predicted answer and other possible wrong ones. Moreover, we train an NLI model enhanced with contrastive explanations and achieves an accuracy of 91.9% on SNLI, gaining improvements of 5.7% against ETPA (“Explain-Then-Predict-Attention”) and 0.6% against NILE (“WHY A”).

3 citations

Trending Questions (1)
What are the limitations of transfer learning with a unified text-to-text transformer?

The paper does not mention the limitations of transfer learning with a unified text-to-text transformer.