mT5: A Massively Multilingual Pre-trained Text-to-Text Transformer

doi:10.18653/V1/2021.NAACL-MAIN.41

Open AccessProceedings ArticleDOI

mT5: A Massively Multilingual Pre-trained Text-to-Text Transformer

Linting Xue, +7 more

- pp 483-498

Chats0

TLDR

This paper proposed a multilingual variant of T5, mT5, which was pre-trained on a new Common Crawl-based dataset covering 101 languages and achieved state-of-the-art performance on many multilingual benchmarks.

Abstract:

The recent “Text-to-Text Transfer Transformer” (T5) leveraged a unified text-to-text format and scale to attain state-of-the-art results on a wide variety of English-language NLP tasks. In this paper, we introduce mT5, a multilingual variant of T5 that was pre-trained on a new Common Crawl-based dataset covering 101 languages. We detail the design and modified training of mT5 and demonstrate its state-of-the-art performance on many multilingual benchmarks. We also describe a simple technique to prevent “accidental translation” in the zero-shot setting, where a generative model chooses to (partially) translate its prediction into the wrong language. All of the code and model checkpoints used in this work are publicly available.

Citations

PDF

Open Access

More filters

Journal Article

PaLM: Scaling Language Modeling with Pathways

Aakanksha Chowdhery, +66 more

- 05 Apr 2022 -

arXiv.org

TL;DR: A 540-billion parameter, densely activated, Transformer language model, which is called PaLM achieves breakthrough performance, outperforming the state-of-the-art on a suite of multi-step reasoning tasks, and outperforming average human performance on the recently released BIG-bench benchmark.

...read moreread less

Journal ArticleDOI

BLOOM: A 176B-Parameter Open-Access Multilingual Language Model

Teven Le Scao, +386 more

- 09 Nov 2022 -

arXiv.org

TL;DR: BLOOM as discussed by the authors is a decoder-only Transformer language model that was trained on the ROOTS corpus, a dataset comprising hundreds of sources in 46 natural and 13 programming languages (59 in total).

...read moreread less

Journal ArticleDOI

GPT-NeoX-20B: An Open-Source Autoregressive Language Model

Sid Black, +16 more

TL;DR: GPT-NeoX-20B is introduced, a 20 billion parameter autoregressive language model trained on the Pile, whose weights will be made freely and openly available to the public through a permissive license.

...read moreread less

Collapse

References

PDF

Open Access

More filters

AlBERTo: Italian BERT Language Understanding Model for NLP Challenging Tasks Based on Tweets.

Marco Polignano, +4 more

TL;DR: A BERT language understanding model for the Italian language (AlBERTo) is trained, focused on the language used in social networks, specifically on Twitter, obtaining state of the art results in subjectivity, polarity and irony detection on Italian tweets.

...read moreread less

Proceedings ArticleDOI

Document Ranking with a Pretrained Sequence-to-Sequence Model

Rodrigo Nogueira, +2 more

TL;DR: Surprisingly, it is found that the choice of target tokens impacts effectiveness, even for words that are closely related semantically, which sheds some light on why the sequence-to-sequence formulation for document ranking is effective.

...read moreread less

Posted Content

GLU Variants Improve Transformer.

Noam Shazeer

- 12 Feb 2020 -

arXiv: Learning

TL;DR: Gated Linear Units (GLU) consist of the component-wise product of two linear projections, one of which is first passed through a sigmoid function, and it is found that some of them yield quality improvements over the typically-used ReLU or GELU activations.

...read moreread less

Posted Content

WT5?! Training Text-to-Text Models to Explain their Predictions.

Sharan Narang, +5 more

- 30 Apr 2020 -

arXiv: Computation and Language

TL;DR: This paper uses the text-to-text framework proposed by Raffel et al. (2019) to train language models to output a natural text explanation alongside their prediction, and shows that this approach not only obtains state-of-the-art results on explainability benchmarks, but also permits learning from a limited set of labeled explanations and transferring rationalization abilities across datasets.

...read moreread less

Proceedings ArticleDOI

FlauBERT: Unsupervised Language Model Pre-training for French

Hang Le, +9 more

TL;DR: The authors proposed FlauBERT, a model learned on a very large and heterogeneous French corpus and applied it to various NLP tasks (text classification, paraphrasing, natural language inference, parsing, word sense disambiguation) and showed that most of the time they outperformed other pre-training approaches.

...read moreread less

Collapse

arXiv: Computation and Language

Bleu: a Method for Automatic Evaluation of Machine Translation

Kishore Papineni, +3 more

mT5: A Massively Multilingual Pre-trained Text-to-Text Transformer

Citations

PaLM: Scaling Language Modeling with Pathways

BLOOM: A 176B-Parameter Open-Access Multilingual Language Model

GPT-NeoX-20B: An Open-Source Autoregressive Language Model

A Survey of Large Language Models

PaLI: A Jointly-Scaled Multilingual Language-Image Model

References

AlBERTo: Italian BERT Language Understanding Model for NLP Challenging Tasks Based on Tweets.

Document Ranking with a Pretrained Sequence-to-Sequence Model

GLU Variants Improve Transformer.

WT5?! Training Text-to-Text Models to Explain their Predictions.

FlauBERT: Unsupervised Language Model Pre-training for French

Related Papers (5)

BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding

Unsupervised Cross-lingual Representation Learning at Scale

Attention is All you Need

RoBERTa: A Robustly Optimized BERT Pretraining Approach

Bleu: a Method for Automatic Evaluation of Machine Translation

Trending Questions (3)