scispace - formally typeset
Open AccessProceedings ArticleDOI

mT5: A Massively Multilingual Pre-trained Text-to-Text Transformer

Reads0
Chats0
TLDR
This paper proposed a multilingual variant of T5, mT5, which was pre-trained on a new Common Crawl-based dataset covering 101 languages and achieved state-of-the-art performance on many multilingual benchmarks.
Abstract
The recent “Text-to-Text Transfer Transformer” (T5) leveraged a unified text-to-text format and scale to attain state-of-the-art results on a wide variety of English-language NLP tasks. In this paper, we introduce mT5, a multilingual variant of T5 that was pre-trained on a new Common Crawl-based dataset covering 101 languages. We detail the design and modified training of mT5 and demonstrate its state-of-the-art performance on many multilingual benchmarks. We also describe a simple technique to prevent “accidental translation” in the zero-shot setting, where a generative model chooses to (partially) translate its prediction into the wrong language. All of the code and model checkpoints used in this work are publicly available.

read more

Content maybe subject to copyright    Report

Citations
More filters
Journal ArticleDOI

BLOOM: A 176B-Parameter Open-Access Multilingual Language Model

Teven Le Scao, +386 more
- 09 Nov 2022 - 
TL;DR: BLOOM as discussed by the authors is a decoder-only Transformer language model that was trained on the ROOTS corpus, a dataset comprising hundreds of sources in 46 natural and 13 programming languages (59 in total).
Journal ArticleDOI

GPT-NeoX-20B: An Open-Source Autoregressive Language Model

TL;DR: GPT-NeoX-20B is introduced, a 20 billion parameter autoregressive language model trained on the Pile, whose weights will be made freely and openly available to the public through a permissive license.
Journal ArticleDOI

A Survey of Large Language Models

TL;DR: Recently, a large language model (LLM) as mentioned in this paper has been proposed by pre-training Transformer models over large-scale corpora, showing strong capabilities in solving various NLP tasks.
References
More filters

AlBERTo: Italian BERT Language Understanding Model for NLP Challenging Tasks Based on Tweets.

TL;DR: A BERT language understanding model for the Italian language (AlBERTo) is trained, focused on the language used in social networks, specifically on Twitter, obtaining state of the art results in subjectivity, polarity and irony detection on Italian tweets.
Proceedings ArticleDOI

Document Ranking with a Pretrained Sequence-to-Sequence Model

TL;DR: Surprisingly, it is found that the choice of target tokens impacts effectiveness, even for words that are closely related semantically, which sheds some light on why the sequence-to-sequence formulation for document ranking is effective.
Posted Content

GLU Variants Improve Transformer.

Noam Shazeer
- 12 Feb 2020 - 
TL;DR: Gated Linear Units (GLU) consist of the component-wise product of two linear projections, one of which is first passed through a sigmoid function, and it is found that some of them yield quality improvements over the typically-used ReLU or GELU activations.
Posted Content

WT5?! Training Text-to-Text Models to Explain their Predictions.

TL;DR: This paper uses the text-to-text framework proposed by Raffel et al. (2019) to train language models to output a natural text explanation alongside their prediction, and shows that this approach not only obtains state-of-the-art results on explainability benchmarks, but also permits learning from a limited set of labeled explanations and transferring rationalization abilities across datasets.
Proceedings ArticleDOI

FlauBERT: Unsupervised Language Model Pre-training for French

TL;DR: The authors proposed FlauBERT, a model learned on a very large and heterogeneous French corpus and applied it to various NLP tasks (text classification, paraphrasing, natural language inference, parsing, word sense disambiguation) and showed that most of the time they outperformed other pre-training approaches.
Related Papers (5)
Trending Questions (3)
ISINDEBELE text generation under NLP using MT5 tool

The paper does not specifically mention ISINDEBELE text generation using the MT5 tool. The paper introduces mT5, a multilingual variant of T5, and demonstrates its performance on multilingual benchmarks.

Isindebele text generation under NLP using MT5 tool

The paper does not mention specifically about Isindebele text generation using the MT5 tool.

A Massively Multilingual Pre-trained Text-to-Text Transformer?

The paper introduces mT5, a multilingual variant of T5, which is a massively multilingual pre-trained text-to-text transformer.