scispace - formally typeset
Search or ask a question
Journal Article

Exploring the Limits of Transfer Learning with a Unified Text-to-Text Transformer

TL;DR: This article introduced a unified framework that converts all text-based language problems into a text-to-text format and compared pre-training objectives, architectures, unlabeled data sets, transfer approaches, and other factors on dozens of language understanding tasks.
Abstract: Transfer learning, where a model is first pre-trained on a data-rich task before being fine-tuned on a downstream task, has emerged as a powerful technique in natural language processing (NLP). The effectiveness of transfer learning has given rise to a diversity of approaches, methodology, and practice. In this paper, we explore the landscape of transfer learning techniques for NLP by introducing a unified framework that converts all text-based language problems into a text-to-text format. Our systematic study compares pre-training objectives, architectures, unlabeled data sets, transfer approaches, and other factors on dozens of language understanding tasks. By combining the insights from our exploration with scale and our new ``Colossal Clean Crawled Corpus'', we achieve state-of-the-art results on many benchmarks covering summarization, question answering, text classification, and more. To facilitate future work on transfer learning for NLP, we release our data set, pre-trained models, and code.

Content maybe subject to copyright    Report

Citations
More filters
Proceedings ArticleDOI
01 Jan 2022
TL;DR: Fabbri et al. as discussed by the authors presented a paper on the 2022 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, NCLT 2022.
Abstract: Alexander Fabbri, Xiaojian Wu, Srini Iyer, Haoran Li, Mona Diab. Proceedings of the 2022 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies. 2022.
Proceedings Article
10 Jan 2022
TL;DR: This paper showed that a character-based model trained on only 99k sentences of NArabizi and fined-tuned on a small treebank of this language leads to performance close to those obtained with the same architecture pre-trained on large multilingual and monolingual models.
Abstract: Recent impressive improvements in NLP, largely based on the success of contextual neural language models, have been mostly demonstrated on at most a couple dozen high-resource languages. Building language models and, more generally, NLP systems for non-standardized and low-resource languages remains a challenging task. In this work, we focus on North-African colloquial dialectal Arabic written using an extension of the Latin script, called NArabizi, found mostly on social media and messaging communication. In this low-resource scenario with data displaying a high level of variability, we compare the downstream performance of a character-based language model on part-of-speech tagging and dependency parsing to that of monolingual and multilingual models. We show that a character-based model trained on only 99k sentences of NArabizi and fined-tuned on a small treebank of this language leads to performance close to those obtained with the same architecture pre-trained on large multilingual and monolingual models. Confirming these results a on much larger data set of noisy French user-generated content, we argue that such character-based language models can be an asset for NLP in low-resource and high language variability set-tings.
Proceedings ArticleDOI
01 Jan 2022
TL;DR: Mehta et al. as discussed by the authors presented a paper at the 60th Annual Meeting of the Association for Computational Linguistics (ACLL) on "Long Papers".
Abstract: Sanket Vaibhav Mehta, Jinfeng Rao, Yi Tay, Mihir Kale, Ankur Parikh, Emma Strubell. Proceedings of the 60th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers). 2022.
Posted Content
TL;DR: In this paper, a comparison study of the three factors of embedding methods, distance metrics and clustering methods and their interaction was performed on COVID-19-related hashtags.
Abstract: This paper is a comparison study in the context of Topic Detection on COVID-19 data. There are various approaches for Topic Detection, among which the Clustering approach is selected in this paper. Clustering requires distance and calculating distance needs embedding. The aim of this research is to simultaneously study the three factors of embedding methods, distance metrics and clustering methods and their interaction. A dataset including one-month tweets collected with COVID-19-related hashtags is used for this study. Five methods, from earlier to new methods, are selected among the embedding methods: Word2Vec, fastText, GloVe, BERT and T5. Five clustering methods are investigated in this paper that are: k-means, DBSCAN, OPTICS, spectral and Jarvis-Patrick. Euclidian distance and Cosine distance as the most important distance metrics in this field are also examined. First, more than 7,500 tests are performed to tune the parameters. Then, all the different combinations of embedding methods with distance metrics and clustering methods are investigated by silhouette metric. The number of these combinations is 50 cases. First, the results of these 50 tests are examined. Then, the rank of each method is taken into account in all the tests of that method. Finally, the major variables of the research (embedding methods, distance metrics and clustering methods) are studied separately. Averaging is performed over the control variables to neutralize their effect. The experimental results show that T5 strongly outperforms other embedding methods in terms of silhouette metric. In terms of distance metrics, cosine distance is weakly better. DBSCAN is also superior to other methods in terms of clustering methods.
Proceedings Article
01 Nov 2021
TL;DR: SPRING Online Services as mentioned in this paper is a web interface and RESTful APIs for SPRING (Symmetric PaRsIng aNd Generation), a state-of-the-art AMR parsing and generation system.
Abstract: In this paper we present SPRING Online Services, a Web interface and RESTful APIs for our state-of-the-art AMR parsing and generation system, SPRING (Symmetric PaRsIng aNd Generation). The Web interface has been developed to be easily used by the Natural Language Processing community, as well as by the general public. It provides, among other things, a highly interactive visualization platform and a feedback mechanism to obtain user suggestions for further improvements of the system’s output. Moreover, our RESTful APIs enable easy integration of SPRING in downstream applications where AMR structures are needed. Finally, we make SPRING Online Services freely available at http://nlp.uniroma1.it/spring and, in addition, we release extra model checkpoints to be used with the original SPRING Python code.
Trending Questions (1)
What are the limitations of transfer learning with a unified text-to-text transformer?

The paper does not mention the limitations of transfer learning with a unified text-to-text transformer.