mT5: A Massively Multilingual Pre-trained Text-to-Text Transformer

doi:10.18653/V1/2021.NAACL-MAIN.41

Open AccessProceedings ArticleDOI

mT5: A Massively Multilingual Pre-trained Text-to-Text Transformer

Linting Xue, +7 more

- pp 483-498

Chats0

TLDR

This paper proposed a multilingual variant of T5, mT5, which was pre-trained on a new Common Crawl-based dataset covering 101 languages and achieved state-of-the-art performance on many multilingual benchmarks.

Abstract:

The recent “Text-to-Text Transfer Transformer” (T5) leveraged a unified text-to-text format and scale to attain state-of-the-art results on a wide variety of English-language NLP tasks. In this paper, we introduce mT5, a multilingual variant of T5 that was pre-trained on a new Common Crawl-based dataset covering 101 languages. We detail the design and modified training of mT5 and demonstrate its state-of-the-art performance on many multilingual benchmarks. We also describe a simple technique to prevent “accidental translation” in the zero-shot setting, where a generative model chooses to (partially) translate its prediction into the wrong language. All of the code and model checkpoints used in this work are publicly available.

Citations

PDF

Open Access

More filters

Journal Article

PaLM: Scaling Language Modeling with Pathways

Aakanksha Chowdhery, +66 more

- 05 Apr 2022 -

arXiv.org

TL;DR: A 540-billion parameter, densely activated, Transformer language model, which is called PaLM achieves breakthrough performance, outperforming the state-of-the-art on a suite of multi-step reasoning tasks, and outperforming average human performance on the recently released BIG-bench benchmark.

...read moreread less

Journal ArticleDOI

BLOOM: A 176B-Parameter Open-Access Multilingual Language Model

Teven Le Scao, +386 more

- 09 Nov 2022 -

arXiv.org

TL;DR: BLOOM as discussed by the authors is a decoder-only Transformer language model that was trained on the ROOTS corpus, a dataset comprising hundreds of sources in 46 natural and 13 programming languages (59 in total).

...read moreread less

Journal ArticleDOI

GPT-NeoX-20B: An Open-Source Autoregressive Language Model

Sid Black, +16 more

TL;DR: GPT-NeoX-20B is introduced, a 20 billion parameter autoregressive language model trained on the Pile, whose weights will be made freely and openly available to the public through a permissive license.

...read moreread less

Collapse

References

PDF

Open Access

More filters

Proceedings ArticleDOI

CamemBERT: a Tasty French Language Model

Louis Martin, +7 more

TL;DR: CamemBERT as discussed by the authors is a French version of the Bi-directional Encoders for Transformers (BERT) for part-of-speech tagging, dependency parsing, named entity recognition, and natural language inference.

...read moreread less

Proceedings ArticleDOI

PhoBERT: Pre-trained language models for Vietnamese

Dat Quoc Nguyen, +1 more

TL;DR: Experimental results show that PhoBERT consistently outperforms the recent best pre-trained multilingual model XLM-R and improves the state-of-the-art in multiple Vietnamese-specific NLP tasks including Part- of-speech tagging, Dependency parsing, Named-entity recognition and Natural language inference.

...read moreread less

Posted Content

BERTje : A Dutch BERT Model

Wietse de Vries, +5 more

- 19 Dec 2019 -

arXiv: Computation and Language

TL;DR: The transformer-based pre-trained language model BERT has helped to improve state-of-the-art performance on many natural language processing (NLP) tasks, but a monolingual Dutch BERT model called BERTje is developed and evaluated, which consistently outperforms the equally-sized multilingual Bert model on downstream NLP tasks.

...read moreread less

Journal ArticleDOI

TyDi QA: A Benchmark for Information-Seeking Question Answering in Typologically Diverse Languages

Jonathan H. Clark, +6 more

- 23 Jul 2020 -

Transactions of the Association for Comp...

TL;DR: TyDi QA as mentioned in this paper ) is a question answering dataset covering 11 typologically diverse languages with question answering in English, French, German, Dutch, Italian, Spanish, and Russian.

...read moreread less

Proceedings ArticleDOI

MLQA: Evaluating Cross-lingual Extractive Question Answering

Patrick S. H. Lewis, +4 more

TL;DR: MLQA as discussed by the authors ) is a multi-way aligned extractive QA evaluation benchmark intended to spur research in this area, which contains QA instances in 7 languages, English, Arabic, German, Spanish, Hindi, Vietnamese and Simplified Chinese.

...read moreread less

Collapse

arXiv: Computation and Language

Bleu: a Method for Automatic Evaluation of Machine Translation

Kishore Papineni, +3 more

mT5: A Massively Multilingual Pre-trained Text-to-Text Transformer

Citations

PaLM: Scaling Language Modeling with Pathways

BLOOM: A 176B-Parameter Open-Access Multilingual Language Model

GPT-NeoX-20B: An Open-Source Autoregressive Language Model

A Survey of Large Language Models

PaLI: A Jointly-Scaled Multilingual Language-Image Model

References

CamemBERT: a Tasty French Language Model

PhoBERT: Pre-trained language models for Vietnamese

BERTje : A Dutch BERT Model

TyDi QA: A Benchmark for Information-Seeking Question Answering in Typologically Diverse Languages

MLQA: Evaluating Cross-lingual Extractive Question Answering

Related Papers (5)

BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding

Unsupervised Cross-lingual Representation Learning at Scale

Attention is All you Need

RoBERTa: A Robustly Optimized BERT Pretraining Approach

Bleu: a Method for Automatic Evaluation of Machine Translation

Trending Questions (3)