scispace - formally typeset
Open AccessProceedings ArticleDOI

mT5: A Massively Multilingual Pre-trained Text-to-Text Transformer

TLDR
This paper proposed a multilingual variant of T5, mT5, which was pre-trained on a new Common Crawl-based dataset covering 101 languages and achieved state-of-the-art performance on many multilingual benchmarks.
Abstract
The recent “Text-to-Text Transfer Transformer” (T5) leveraged a unified text-to-text format and scale to attain state-of-the-art results on a wide variety of English-language NLP tasks. In this paper, we introduce mT5, a multilingual variant of T5 that was pre-trained on a new Common Crawl-based dataset covering 101 languages. We detail the design and modified training of mT5 and demonstrate its state-of-the-art performance on many multilingual benchmarks. We also describe a simple technique to prevent “accidental translation” in the zero-shot setting, where a generative model chooses to (partially) translate its prediction into the wrong language. All of the code and model checkpoints used in this work are publicly available.

read more

Content maybe subject to copyright    Report

Citations
More filters
Journal ArticleDOI

BLOOM: A 176B-Parameter Open-Access Multilingual Language Model

Teven Le Scao, +386 more
- 09 Nov 2022 - 
TL;DR: BLOOM as discussed by the authors is a decoder-only Transformer language model that was trained on the ROOTS corpus, a dataset comprising hundreds of sources in 46 natural and 13 programming languages (59 in total).
Journal ArticleDOI

GPT-NeoX-20B: An Open-Source Autoregressive Language Model

TL;DR: GPT-NeoX-20B is introduced, a 20 billion parameter autoregressive language model trained on the Pile, whose weights will be made freely and openly available to the public through a permissive license.
Journal ArticleDOI

A Survey of Large Language Models

TL;DR: Recently, a large language model (LLM) as mentioned in this paper has been proposed by pre-training Transformer models over large-scale corpora, showing strong capabilities in solving various NLP tasks.
References
More filters
Proceedings Article

Pre-training via Paraphrasing

TL;DR: This paper proposed MARGE, a pre-trained sequence-to-sequence model with an unsupervised multi-lingual multi-document paraphrasing objective, which achieves strong zero-shot performance on several tasks.
Posted Content

InfoXLM: An Information-Theoretic Framework for Cross-Lingual Language Model Pre-Training

TL;DR: The authors formulate cross-lingual language model pre-training as maximizing mutual information between multilingual-multi-granularity texts and propose a contrastive learning approach to improve the crosslingual transferability of pre-trained models.
Posted Content

Playing with Words at the National Library of Sweden : Making a Swedish BERT

TL;DR: The Swedish BERT ("KB-BERT") developed by the KBLab for data-driven research at the National Library of Sweden is introduced and it is demonstrated that KB-berT outperforms existing models in a range of NLP tasks from named entity recognition (NER) to part-of-speech tagging (POS).
Posted Content

FILTER: An Enhanced Fusion Method for Cross-lingual Language Understanding

TL;DR: FILTER is proposed, an enhanced fusion method that takes cross-lingual data as input for XLM finetuning and proposes an additional KL-divergence self-teaching loss for model training, based on auto-generated soft pseudo-labels for translated text in the target language.
Posted Content

RobBERT: a Dutch RoBERTa-based Language Model

TL;DR: This paper used RoBERTa, a robustly optimized BERT approach, to train a Dutch language model called RobBERT and evaluated its performance on various tasks as well as the importance of the fine-tuning dataset size.
Related Papers (5)
Trending Questions (3)
ISINDEBELE text generation under NLP using MT5 tool

The paper does not specifically mention ISINDEBELE text generation using the MT5 tool. The paper introduces mT5, a multilingual variant of T5, and demonstrates its performance on multilingual benchmarks.

Isindebele text generation under NLP using MT5 tool

The paper does not mention specifically about Isindebele text generation using the MT5 tool.

A Massively Multilingual Pre-trained Text-to-Text Transformer?

The paper introduces mT5, a multilingual variant of T5, which is a massively multilingual pre-trained text-to-text transformer.