scispace - formally typeset
Search or ask a question
Journal Article

Exploring the Limits of Transfer Learning with a Unified Text-to-Text Transformer

TL;DR: This article introduced a unified framework that converts all text-based language problems into a text-to-text format and compared pre-training objectives, architectures, unlabeled data sets, transfer approaches, and other factors on dozens of language understanding tasks.
Abstract: Transfer learning, where a model is first pre-trained on a data-rich task before being fine-tuned on a downstream task, has emerged as a powerful technique in natural language processing (NLP). The effectiveness of transfer learning has given rise to a diversity of approaches, methodology, and practice. In this paper, we explore the landscape of transfer learning techniques for NLP by introducing a unified framework that converts all text-based language problems into a text-to-text format. Our systematic study compares pre-training objectives, architectures, unlabeled data sets, transfer approaches, and other factors on dozens of language understanding tasks. By combining the insights from our exploration with scale and our new ``Colossal Clean Crawled Corpus'', we achieve state-of-the-art results on many benchmarks covering summarization, question answering, text classification, and more. To facilitate future work on transfer learning for NLP, we release our data set, pre-trained models, and code.

Content maybe subject to copyright    Report

Citations
More filters
Proceedings ArticleDOI
14 Dec 2020
TL;DR: This article reviewed some of the unique challenges inherent in the Arabic language that contribute to this accuracy lag beyond the scale-based economic drivers propelling enhanced accuracy in other languages and identified the most promising new tools and provided a framework for evaluating the differences in sensitivity and polarity.
Abstract: While the accuracy of Arabic text sentiment analysis tools continues to improve, the accuracy continues to lag behind similar tools for Latin-based languages. In this work we review some of the unique challenges inherent in the Arabic language that contribute to this accuracy lag beyond the scale-based economic drivers propelling enhanced accuracy in other languages. We then identify some of the most promising new tools and provide a framework for evaluating the differences in sensitivity and polarity. While there is not yet a universal standard scale for sentiment polarity, we attempt to provide a normalized basis upon which to compare the degree to which various tools tend to classify text segments toward either end (or the middle) of the polarity spectrum.

2 citations

Proceedings ArticleDOI
01 Jan 2022
TL;DR: This paper proposed a novel data augmentation method FlipDA that jointly uses a generative model and a classifier to generate label-flipped data, which is more crucial to the performance than generating label-preserved data.
Abstract: Most previous methods for text data augmentation are limited to simple tasks and weak baselines. We explore data augmentation on hard tasks (i.e., few-shot natural language understanding) and strong baselines (i.e., pretrained models with over one billion parameters). Under this setting, we reproduced a large number of previous augmentation methods and found that these methods bring marginal gains at best and sometimes degrade the performance much. To address this challenge, we propose a novel data augmentation method FlipDA that jointly uses a generative model and a classifier to generate label-flipped data. Central to the idea of FlipDA is the discovery that generating label-flipped data is more crucial to the performance than generating label-preserved data. Experiments show that FlipDA achieves a good tradeoff between effectiveness and robustness—it substantially improves many tasks while not negatively affecting the others.

2 citations

Posted Content
TL;DR: The authors explain background about formal languages as they relate to this recent work, focusing on concepts connecting to modern deep learning-based NLP, and by necessity ignore large parts of the rich history of this field.
Abstract: NLP is deeply intertwined with the formal study of language, both conceptually and historically. Arguably, this connection goes all the way back to Chomsky's Syntactic Structures in 1957. It also still holds true today, with a strand of recent works building formal analysis of modern neural networks methods in terms of formal languages. In this document, I aim to explain background about formal languages as they relate to this recent work. I will by necessity ignore large parts of the rich history of this field, instead focusing on concepts connecting to modern deep learning-based NLP.

2 citations

Proceedings ArticleDOI
01 Aug 2021
TL;DR: This paper used RoBERTa language model and CRF for toxic span detection task in SemEval-2021 Task 5: Toxic Spans Detection and achieved a rank of 41 with an F1 score of 66.16%.
Abstract: This paper describes our contribution to SemEval-2021 Task 5: Toxic Spans Detection. Our solution is built upon RoBERTa language model and Conditional Random Fields (CRF). We pre-trained RoBERTa on Civil Comments dataset, enabling it to create better contextual representation for this task. We also employed the semi-supervised learning technique of self-training, which allowed us to extend our training dataset. In addition to these, we also identified some pre-processing steps that significantly improved our F1 score. Our proposed system achieved a rank of 41 with an F1 score of 66.16%.

2 citations

Proceedings ArticleDOI
01 Jun 2021
TL;DR: This article proposed a neural response generation model with multi-task learning of generation and classification, focusing on emotion, which made generated responses more emotionally aware and made them more human-like.
Abstract: For a computer to naturally interact with a human, it needs to be human-like. In this paper, we propose a neural response generation model with multi-task learning of generation and classification, focusing on emotion. Our model based on BART (Lewis et al., 2020), a pre-trained transformer encoder-decoder model, is trained to generate responses and recognize emotions simultaneously. Furthermore, we weight the losses for the tasks to control the update of parameters. Automatic evaluations and crowdsourced manual evaluations show that the proposed model makes generated responses more emotionally aware.

2 citations

Trending Questions (1)
What are the limitations of transfer learning with a unified text-to-text transformer?

The paper does not mention the limitations of transfer learning with a unified text-to-text transformer.