mT5: A Massively Multilingual Pre-trained Text-to-Text Transformer

doi:10.18653/V1/2021.NAACL-MAIN.41

Open AccessProceedings ArticleDOI

mT5: A Massively Multilingual Pre-trained Text-to-Text Transformer

- pp 483-498

TLDR

This paper proposed a multilingual variant of T5, mT5, which was pre-trained on a new Common Crawl-based dataset covering 101 languages and achieved state-of-the-art performance on many multilingual benchmarks.

Abstract:

The recent “Text-to-Text Transfer Transformer” (T5) leveraged a unified text-to-text format and scale to attain state-of-the-art results on a wide variety of English-language NLP tasks. In this paper, we introduce mT5, a multilingual variant of T5 that was pre-trained on a new Common Crawl-based dataset covering 101 languages. We detail the design and modified training of mT5 and demonstrate its state-of-the-art performance on many multilingual benchmarks. We also describe a simple technique to prevent “accidental translation” in the zero-shot setting, where a generative model chooses to (partially) translate its prediction into the wrong language. All of the code and model checkpoints used in this work are publicly available.

Citations

PDF

Open Access

More filters

Journal Article

PaLM: Scaling Language Modeling with Pathways

Aakanksha Chowdhery, +66 more

- 05 Apr 2022 -

arXiv.org

TL;DR: A 540-billion parameter, densely activated, Transformer language model, which is called PaLM achieves breakthrough performance, outperforming the state-of-the-art on a suite of multi-step reasoning tasks, and outperforming average human performance on the recently released BIG-bench benchmark.

...read moreread less

Journal ArticleDOI

BLOOM: A 176B-Parameter Open-Access Multilingual Language Model

Teven Le Scao, +386 more

- 09 Nov 2022 -

arXiv.org

TL;DR: BLOOM as discussed by the authors is a decoder-only Transformer language model that was trained on the ROOTS corpus, a dataset comprising hundreds of sources in 46 natural and 13 programming languages (59 in total).

...read moreread less

Journal ArticleDOI

GPT-NeoX-20B: An Open-Source Autoregressive Language Model

Sid Black, +16 more

TL;DR: GPT-NeoX-20B is introduced, a 20 billion parameter autoregressive language model trained on the Pile, whose weights will be made freely and openly available to the public through a permissive license.

...read moreread less

Collapse

References

PDF

Open Access

More filters

VECO: Variable Encoder-decoder Pre-training for Cross-lingual Understanding and Generation

Fuli Luo, +7 more

TL;DR: A variable encoder-decoder (VECO) pre-training approach to unify the two mainstreams in both model architectures and pre- training tasks, which delivers new state-of-the-art results on various cross-lingual understanding tasks of the XTREME benchmark.

...read moreread less

Posted Content

PTT5: Pretraining and validating the T5 model on Brazilian Portuguese data.

Diedre Carmo, +4 more

- 20 Aug 2020 -

arXiv: Computation and Language

TL;DR: This paper pretrain a T5 model on the BrWac corpus, an extensive collection of web pages in Portuguese, and evaluates its performance against other Portuguese pretrained models and multilingual models on the sentence similarity and sentence entailment tasks.

...read moreread less

Posted Content

Unifying Question Answering and Text Classification via Span Extraction.

Nitish Shirish Keskar, +3 more

TL;DR: A unified, span-extraction approach leads to superior or comparable performance in multi-task learning, low-data and supplementary supervised pretraining experiments on several text classification and question answering benchmarks.

...read moreread less

Proceedings Article

Rethinking Embedding Coupling in Pre-trained Language Models

Hyung Won Chung, +4 more

TL;DR: The authors re-evaluate the standard practice of sharing weights between input and output embeddings in state-of-the-art pre-trained language models and show that decoupled embedding provides increased modeling flexibility, allowing to significantly improve the efficiency of parameter allocation in the input embedding of multilingual models.

...read moreread less

Posted Content

Exploring Fine-tuning Techniques for Pre-trained Cross-lingual Models via Continual Learning

Zihan Liu, +3 more

- 29 Apr 2020 -

arXiv: Computation and Language

TL;DR: The method achieves better performance than other fine-tuning baselines on zero-shot cross-lingual part-of-speech tagging and named entity recognition tasks and preserves the original cross- Lingual ability of the pre-trained model when the authors fine-tune it to downstream cross-lingsual tasks.

...read moreread less

Collapse

arXiv: Computation and Language

Bleu: a Method for Automatic Evaluation of Machine Translation

Kishore Papineni, +3 more

mT5: A Massively Multilingual Pre-trained Text-to-Text Transformer

Citations

PaLM: Scaling Language Modeling with Pathways

BLOOM: A 176B-Parameter Open-Access Multilingual Language Model

GPT-NeoX-20B: An Open-Source Autoregressive Language Model

A Survey of Large Language Models

PaLI: A Jointly-Scaled Multilingual Language-Image Model

References

VECO: Variable Encoder-decoder Pre-training for Cross-lingual Understanding and Generation

PTT5: Pretraining and validating the T5 model on Brazilian Portuguese data.

Unifying Question Answering and Text Classification via Span Extraction.

Rethinking Embedding Coupling in Pre-trained Language Models

Exploring Fine-tuning Techniques for Pre-trained Cross-lingual Models via Continual Learning

Related Papers (5)

BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding

Unsupervised Cross-lingual Representation Learning at Scale

Attention is All you Need

RoBERTa: A Robustly Optimized BERT Pretraining Approach

Bleu: a Method for Automatic Evaluation of Machine Translation

Trending Questions (3)