mT5: A Massively Multilingual Pre-trained Text-to-Text Transformer

doi:10.18653/V1/2021.NAACL-MAIN.41

Citations

PDF

Open Access

More filters

Proceedings Article•DOI•

Bidirectional Language Models Are Also Few-shot Learners

[...]

Ajay Patel, Bryan Li, Mohammad Rasooli, Noah Constant, Colin Raffel, Chris Callison-Burch - Show less +2 more

29 Sep 2022

TL;DR: S AP (Sequential Autore- 018 gressive Prompting), a technique that enables the prompting of bidirectional models, is presented and for the first time, prompt-based learning is an emergent property of a broader class of language models, rather than a property of only uniddirectional models.

...read moreread less

Abstract: Large language models such as GPT-3 (Brown et al., 2020) can perform arbitrary tasks without undergoing fine-tuning after being prompted with only a few labeled examples. An arbitrary task can be reformulated as a natural language prompt, and a language model can be asked to generate the completion, indirectly performing the task in a paradigm known as prompt-based learning. To date, emergent prompt-based learning capabilities have mainly been demonstrated for unidirectional language models. However, bidirectional language models pre-trained on denoising objectives such as masked language modeling produce stronger learned representations for transfer learning. This motivates the possibility of prompting bidirectional models, but their pre-training objectives have made them largely incompatible with the existing prompting paradigm. We present SAP (Sequential Autoregressive Prompting), a technique that enables the prompting of bidirectional models. Utilizing the machine translation task as a case study, we prompt the bidirectional mT5 model (Xue et al., 2021) with SAP and demonstrate its few-shot and zero-shot translations outperform the few-shot translations of unidirectional models like GPT-3 and XGLM (Lin et al., 2021), despite mT5's approximately 50% fewer parameters. We further show SAP is effective on question answering and summarization. For the first time, our results demonstrate prompt-based learning is an emergent property of a broader class of language models, rather than only unidirectional models.

...read moreread less

10 citations

Proceedings Article•DOI•

Polyglot Prompt: Multilingual Multitask Prompt Training

[...]

Jinlan Fu, See-Kiong Ng, Pengfei Liu

29 Apr 2022

TL;DR: An interpretable multilingual evaluation methodology is presented and it is shown how the proposed framework, multilingual multitask prompt training, works.

...read moreread less

Abstract: This paper aims for a potential architectural improvement for multilingual learning and asks: Can different tasks from different languages be modeled in a monolithic framework, i.e. without any task/language-specific module? The benefit of achieving this could open new doors for future multilingual research, including allowing systems trained on low resources to be further assisted by other languages as well as other tasks. We approach this goal by developing a learning framework named Polyglot Prompting to exploit prompting methods for learning a unified semantic space for different languages and tasks with multilingual prompt engineering. We performed a comprehensive evaluation of 6 tasks, namely topic classification, sentiment classification, named entity recognition, question answering, natural language inference, and summarization, covering 24 datasets and 49 languages. The experimental results demonstrated the efficacy of multilingual multitask prompt-based learning and led to inspiring observations. We also present an interpretable multilingual evaluation methodology and show how the proposed framework, multilingual multitask prompt training, works. We release all datasets prompted in the best setting and code.

...read moreread less

10 citations

Journal Article•DOI•

Coloring the Blank Slate: Pre-training Imparts a Hierarchical Inductive Bias to Sequence-to-sequence Models

[...]

Aaron Mueller, Robert Frank, Tal Linzen, Luheng Wang, Sebastian Schuster - Show less +1 more

17 Mar 2022-Findings

TL;DR: It is demonstrated that seq2seq models are capable of syntactic generalization, though only after exposure to much more language data than human learners receive, while also demonstrating the learnability of hierarchical syntactic information from non-annotated natural language text.

...read moreread less

Abstract: Relations between words are governed by hierarchical structure rather than linear ordering. Sequence-to-sequence (seq2seq) models, despite their success in downstream NLP applications, often fail to generalize in a hierarchy-sensitive manner when performing syntactic transformations—for example, transforming declarative sentences into questions. However, syntactic evaluations of seq2seq models have only observed models that were not pre-trained on natural language data before being trained to perform syntactic transformations, in spite of the fact that pre-training has been found to induce hierarchical linguistic generalizations in language models; in other words, the syntactic capabilities of seq2seq models may have been greatly understated. We address this gap using the pre-trained seq2seq models T5 and BART, as well as their multilingual variants mT5 and mBART. We evaluate whether they generalize hierarchically on two transformations in two languages: question formation and passivization in English and German. We find that pre-trained seq2seq models generalize hierarchically when performing syntactic transformations, whereas models trained from scratch on syntactic transformations do not. This result presents evidence for the learnability of hierarchical syntactic information from non-annotated natural language text while also demonstrating that seq2seq models are capable of syntactic generalization, though only after exposure to much more language data than human learners receive.

...read moreread less

10 citations

Proceedings Article•

MAD-G: Multilingual Adapter Generation for Efficient Cross-Lingual Transfer.

[...]

Alan Ansell, Edoardo Maria Ponti¹, Jonas Pfeiffer², Sebastian Ruder³, Goran Glavaš⁴, Ivan Vulić¹, Anna Korhonen⁵ - Show less +3 more•Institutions (5)

University of Cambridge¹, Technische Universität Darmstadt², Google³, University of Mannheim⁴, Technion – Israel Institute of Technology⁵

01 Nov 2021

TL;DR: The authors propose MAD-G (Multilingual ADapter Generation) which generates language adapters from language representations based on typological features. But this approach is not viable for the vast majority of languages, due to limitations in their corpus size or compute budgets.

...read moreread less

Abstract: Adapter modules have emerged as a general parameter-efficient means to specialize a pretrained encoder to new domains. Massively multilingual transformers (MMTs) have particularly benefited from additional training of language-specific adapters. However, this approach is not viable for the vast majority of languages, due to limitations in their corpus size or compute budgets. In this work, we propose MAD-G (Multilingual ADapter Generation), which contextually generates language adapters from language representations based on typological features. In contrast to prior work, our time- and space-efficient MAD-G approach enables (1) sharing of linguistic knowledge across languages and (2) zero-shot inference by generating language adapters for unseen languages. We thoroughly evaluate MAD-G in zero-shot cross-lingual transfer on part-of-speech tagging, dependency parsing, and named entity recognition. While offering (1) improved fine-tuning efficiency (by a factor of around 50 in our experiments), (2) a smaller parameter budget, and (3) increased language coverage, MAD-G remains competitive with more expensive methods for language-specific adapter training across the board. Moreover, it offers substantial benefits for low-resource languages, particularly on the NER task in low-resource African languages. Finally, we demonstrate that MAD-G’s transfer performance can be further improved via: (i) multi-source training, i.e., by generating and combining adapters of multiple languages with available task-specific training data; and (ii) by further fine-tuning generated MAD-G adapters for languages with monolingual data.

...read moreread less

10 citations

Proceedings Article•DOI•

ViT5: Pretrained Text-to-Text Transformer for Vietnamese Language Generation

[...]

Long Phan, H. Tran, Hieu Chi Nguyen, Trieu H. Trinh

13 May 2022

TL;DR: The experiments show that ViT5 significantly outperforms existing models and achieves state-of-the-art results on Vietnamese Text Summarization, and the importance of context length during the self-supervised pretraining on downstream performance across different settings is shown.

...read moreread less

Abstract: We present ViT5, a pretrained Transformer-based encoder-decoder model for the Vietnamese language. With T5-style self-supervised pretraining, ViT5 is trained on a large corpus of high-quality and diverse Vietnamese texts. We benchmark ViT5 on two downstream text generation tasks, Abstractive Text Summarization and Named Entity Recognition. Although Abstractive Text Summarization has been widely studied for the English language thanks to its rich and large source of data, there has been minimal research into the same task in Vietnamese, a much lower resource language. In this work, we perform exhaustive experiments on both Vietnamese Abstractive Summarization and Named Entity Recognition, validating the performance of ViT5 against many other pretrained Transformer-based encoder-decoder models. Our experiments show that ViT5 significantly outperforms existing models and achieves state-of-the-art results on Vietnamese Text Summarization. On the task of Named Entity Recognition, ViT5 is competitive against previous best results from pretrained encoder-based Transformer models. Further analysis shows the importance of context length during the self-supervised pretraining on downstream performance across different settings.

...read moreread less

10 citations

Collapse

mT5: A Massively Multilingual Pre-trained Text-to-Text Transformer

Citations

References

"mT5: A Massively Multilingual Pre-t..." refers methods in this paper

"mT5: A Massively Multilingual Pre-t..." refers background or methods in this paper

Related Papers (5)

Trending Questions (3)