mT5: A Massively Multilingual Pre-trained Text-to-Text Transformer

doi:10.18653/V1/2021.NAACL-MAIN.41

Open AccessProceedings ArticleDOI

mT5: A Massively Multilingual Pre-trained Text-to-Text Transformer

- pp 483-498

TLDR

This paper proposed a multilingual variant of T5, mT5, which was pre-trained on a new Common Crawl-based dataset covering 101 languages and achieved state-of-the-art performance on many multilingual benchmarks.

Abstract:

The recent “Text-to-Text Transfer Transformer” (T5) leveraged a unified text-to-text format and scale to attain state-of-the-art results on a wide variety of English-language NLP tasks. In this paper, we introduce mT5, a multilingual variant of T5 that was pre-trained on a new Common Crawl-based dataset covering 101 languages. We detail the design and modified training of mT5 and demonstrate its state-of-the-art performance on many multilingual benchmarks. We also describe a simple technique to prevent “accidental translation” in the zero-shot setting, where a generative model chooses to (partially) translate its prediction into the wrong language. All of the code and model checkpoints used in this work are publicly available.

Citations

PDF

Open Access

More filters

Journal Article

PaLM: Scaling Language Modeling with Pathways

Aakanksha Chowdhery, +66 more

- 05 Apr 2022 -

arXiv.org

TL;DR: A 540-billion parameter, densely activated, Transformer language model, which is called PaLM achieves breakthrough performance, outperforming the state-of-the-art on a suite of multi-step reasoning tasks, and outperforming average human performance on the recently released BIG-bench benchmark.

...read moreread less

Journal ArticleDOI

BLOOM: A 176B-Parameter Open-Access Multilingual Language Model

Teven Le Scao, +386 more

- 09 Nov 2022 -

arXiv.org

TL;DR: BLOOM as discussed by the authors is a decoder-only Transformer language model that was trained on the ROOTS corpus, a dataset comprising hundreds of sources in 46 natural and 13 programming languages (59 in total).

...read moreread less

Journal ArticleDOI

GPT-NeoX-20B: An Open-Source Autoregressive Language Model

Sid Black, +16 more

TL;DR: GPT-NeoX-20B is introduced, a 20 billion parameter autoregressive language model trained on the Pile, whose weights will be made freely and openly available to the public through a permissive license.

...read moreread less

Collapse

References

PDF

Open Access

More filters

Proceedings Article

Pre-training via Paraphrasing

Michael Lewis, +5 more

TL;DR: This paper proposed MARGE, a pre-trained sequence-to-sequence model with an unsupervised multi-lingual multi-document paraphrasing objective, which achieves strong zero-shot performance on several tasks.

...read moreread less

Posted Content

InfoXLM: An Information-Theoretic Framework for Cross-Lingual Language Model Pre-Training

Zewen Chi, +9 more

- 15 Jul 2020 -

arXiv: Computation and Language

TL;DR: The authors formulate cross-lingual language model pre-training as maximizing mutual information between multilingual-multi-granularity texts and propose a contrastive learning approach to improve the crosslingual transferability of pre-trained models.

...read moreread less

Posted Content

Playing with Words at the National Library of Sweden : Making a Swedish BERT

Martin Malmsten, +2 more

- 01 Jan 2021 -

arXiv: Computation and Language

TL;DR: The Swedish BERT ("KB-BERT") developed by the KBLab for data-driven research at the National Library of Sweden is introduced and it is demonstrated that KB-berT outperforms existing models in a range of NLP tasks from named entity recognition (NER) to part-of-speech tagging (POS).

...read moreread less

Posted Content

FILTER: An Enhanced Fusion Method for Cross-lingual Language Understanding

Yuwei Fang, +4 more

- 10 Sep 2020 -

arXiv: Computation and Language

TL;DR: FILTER is proposed, an enhanced fusion method that takes cross-lingual data as input for XLM finetuning and proposes an additional KL-divergence self-teaching loss for model training, based on auto-generated soft pseudo-labels for translated text in the target language.

...read moreread less

Posted Content

RobBERT: a Dutch RoBERTa-based Language Model

Pieter Delobelle, +2 more

- 17 Jan 2020 -

arXiv: Computation and Language

TL;DR: This paper used RoBERTa, a robustly optimized BERT approach, to train a Dutch language model called RobBERT and evaluated its performance on various tasks as well as the importance of the fine-tuning dataset size.

...read moreread less

Collapse

arXiv: Computation and Language

Bleu: a Method for Automatic Evaluation of Machine Translation

Kishore Papineni, +3 more

mT5: A Massively Multilingual Pre-trained Text-to-Text Transformer

Citations

PaLM: Scaling Language Modeling with Pathways

BLOOM: A 176B-Parameter Open-Access Multilingual Language Model

GPT-NeoX-20B: An Open-Source Autoregressive Language Model

A Survey of Large Language Models

PaLI: A Jointly-Scaled Multilingual Language-Image Model

References

Pre-training via Paraphrasing

InfoXLM: An Information-Theoretic Framework for Cross-Lingual Language Model Pre-Training

Playing with Words at the National Library of Sweden : Making a Swedish BERT

FILTER: An Enhanced Fusion Method for Cross-lingual Language Understanding

RobBERT: a Dutch RoBERTa-based Language Model

Related Papers (5)

BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding

Unsupervised Cross-lingual Representation Learning at Scale

Attention is All you Need

RoBERTa: A Robustly Optimized BERT Pretraining Approach

Bleu: a Method for Automatic Evaluation of Machine Translation

Trending Questions (3)