Exploring the Limits of Transfer Learning with a Unified Text-to-Text Transformer

Home
/
Papers
/
Exploring the Limits of Transfer Learning with a Unified Text-to-Text Transformer

Journal Article•

Exploring the Limits of Transfer Learning with a Unified Text-to-Text Transformer

Colin Raffel, Noam Shazeer, Adam Roberts, Katherine Lee, Sharan Narang, Michael Matena, Yanqi Zhou, Wei Li, Peter J. Liu - Show less +5 more

01 Jan 2020-Journal of Machine Learning Research-Vol. 21, Iss: 140, pp 1-67

TL;DR: This article introduced a unified framework that converts all text-based language problems into a text-to-text format and compared pre-training objectives, architectures, unlabeled data sets, transfer approaches, and other factors on dozens of language understanding tasks.

read less

Abstract: Transfer learning, where a model is first pre-trained on a data-rich task before being fine-tuned on a downstream task, has emerged as a powerful technique in natural language processing (NLP). The effectiveness of transfer learning has given rise to a diversity of approaches, methodology, and practice. In this paper, we explore the landscape of transfer learning techniques for NLP by introducing a unified framework that converts all text-based language problems into a text-to-text format. Our systematic study compares pre-training objectives, architectures, unlabeled data sets, transfer approaches, and other factors on dozens of language understanding tasks. By combining the insights from our exploration with scale and our new ``Colossal Clean Crawled Corpus'', we achieve state-of-the-art results on many benchmarks covering summarization, question answering, text classification, and more. To facilitate future work on transfer learning for NLP, we release our data set, pre-trained models, and code.

...read moreread less

Content maybe subject to copyright Report

Citations

PDF

Open Access

More filters

Proceedings Article•DOI•

AnswerSumm: A Manually-Curated Dataset and Pipeline for Answer Summarization

[...]

01 Jan 2022

TL;DR: Fabbri et al. as discussed by the authors presented a paper on the 2022 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, NCLT 2022.

...read moreread less

Abstract: Alexander Fabbri, Xiaojian Wu, Srini Iyer, Haoran Li, Mona Diab. Proceedings of the 2022 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies. 2022.

...read moreread less

Proceedings Article•

Can Character-based Language Models Improve Downstream Task Performance in Low-Resource and Noisy Language Scenarios?

[...]

10 Jan 2022

TL;DR: This paper showed that a character-based model trained on only 99k sentences of NArabizi and fined-tuned on a small treebank of this language leads to performance close to those obtained with the same architecture pre-trained on large multilingual and monolingual models.

...read moreread less

Abstract: Recent impressive improvements in NLP, largely based on the success of contextual neural language models, have been mostly demonstrated on at most a couple dozen high-resource languages. Building language models and, more generally, NLP systems for non-standardized and low-resource languages remains a challenging task. In this work, we focus on North-African colloquial dialectal Arabic written using an extension of the Latin script, called NArabizi, found mostly on social media and messaging communication. In this low-resource scenario with data displaying a high level of variability, we compare the downstream performance of a character-based language model on part-of-speech tagging and dependency parsing to that of monolingual and multilingual models. We show that a character-based model trained on only 99k sentences of NArabizi and fined-tuned on a small treebank of this language leads to performance close to those obtained with the same architecture pre-trained on large multilingual and monolingual models. Confirming these results a on much larger data set of noisy French user-generated content, we argue that such character-based language models can be an asset for NLP in low-resource and high language variability set-tings.

...read moreread less

Proceedings Article•DOI•

Improving Compositional Generalization with Self-Training for Data-to-Text Generation

[...]

01 Jan 2022

TL;DR: Mehta et al. as discussed by the authors presented a paper at the 60th Annual Meeting of the Association for Computational Linguistics (ACLL) on "Long Papers".

...read moreread less

Abstract: Sanket Vaibhav Mehta, Jinfeng Rao, Yi Tay, Mihir Kale, Ankur Parikh, Emma Strubell. Proceedings of the 60th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers). 2022.

...read moreread less

Posted Content•

A Comparative Study on Transfer Learning and Distance Metrics in Semantic Clustering over the COVID-19 Tweets.

[...]

Elnaz Zafarani-Moattar, Mohammad Reza Kangavari¹, Amir Masoud Rahmani•Institutions (1)

Iran University of Science and Technology¹

16 Nov 2021-arXiv: Computation and Language

TL;DR: In this paper, a comparison study of the three factors of embedding methods, distance metrics and clustering methods and their interaction was performed on COVID-19-related hashtags.

...read moreread less

Abstract: This paper is a comparison study in the context of Topic Detection on COVID-19 data. There are various approaches for Topic Detection, among which the Clustering approach is selected in this paper. Clustering requires distance and calculating distance needs embedding. The aim of this research is to simultaneously study the three factors of embedding methods, distance metrics and clustering methods and their interaction. A dataset including one-month tweets collected with COVID-19-related hashtags is used for this study. Five methods, from earlier to new methods, are selected among the embedding methods: Word2Vec, fastText, GloVe, BERT and T5. Five clustering methods are investigated in this paper that are: k-means, DBSCAN, OPTICS, spectral and Jarvis-Patrick. Euclidian distance and Cosine distance as the most important distance metrics in this field are also examined. First, more than 7,500 tests are performed to tune the parameters. Then, all the different combinations of embedding methods with distance metrics and clustering methods are investigated by silhouette metric. The number of these combinations is 50 cases. First, the results of these 50 tests are examined. Then, the rank of each method is taken into account in all the tests of that method. Finally, the major variables of the research (embedding methods, distance metrics and clustering methods) are studied separately. Averaging is performed over the control variables to neutralize their effect. The experimental results show that T5 strongly outperforms other embedding methods in terms of silhouette metric. In terms of distance metrics, cosine distance is weakly better. DBSCAN is also superior to other methods in terms of clustering methods.

...read moreread less

Proceedings Article•

SPRING Goes Online: End-to-End AMR Parsing and Generation

[...]

Rexhina Blloshmi¹, Michele Bevilacqua¹, Edoardo Fabiano, Valentina Caruso, Roberto Navigli¹ - Show less +1 more•Institutions (1)

Sapienza University of Rome¹

01 Nov 2021

TL;DR: SPRING Online Services as mentioned in this paper is a web interface and RESTful APIs for SPRING (Symmetric PaRsIng aNd Generation), a state-of-the-art AMR parsing and generation system.

...read moreread less

Abstract: In this paper we present SPRING Online Services, a Web interface and RESTful APIs for our state-of-the-art AMR parsing and generation system, SPRING (Symmetric PaRsIng aNd Generation). The Web interface has been developed to be easily used by the Natural Language Processing community, as well as by the general public. It provides, among other things, a highly interactive visualization platform and a feedback mechanism to obtain user suggestions for further improvements of the system’s output. Moreover, our RESTful APIs enable easy integration of SPRING in downstream applications where AMR structures are needed. Finally, we make SPRING Online Services freely available at http://nlp.uniroma1.it/spring and, in addition, we release extra model checkpoints to be used with the original SPRING Python code.

...read moreread less