scispace - formally typeset
Open AccessProceedings ArticleDOI

mT5: A Massively Multilingual Pre-trained Text-to-Text Transformer

TLDR
This paper proposed a multilingual variant of T5, mT5, which was pre-trained on a new Common Crawl-based dataset covering 101 languages and achieved state-of-the-art performance on many multilingual benchmarks.
Abstract
The recent “Text-to-Text Transfer Transformer” (T5) leveraged a unified text-to-text format and scale to attain state-of-the-art results on a wide variety of English-language NLP tasks. In this paper, we introduce mT5, a multilingual variant of T5 that was pre-trained on a new Common Crawl-based dataset covering 101 languages. We detail the design and modified training of mT5 and demonstrate its state-of-the-art performance on many multilingual benchmarks. We also describe a simple technique to prevent “accidental translation” in the zero-shot setting, where a generative model chooses to (partially) translate its prediction into the wrong language. All of the code and model checkpoints used in this work are publicly available.

read more

Content maybe subject to copyright    Report

Citations
More filters
Journal ArticleDOI

BLOOM: A 176B-Parameter Open-Access Multilingual Language Model

Teven Le Scao, +386 more
- 09 Nov 2022 - 
TL;DR: BLOOM as discussed by the authors is a decoder-only Transformer language model that was trained on the ROOTS corpus, a dataset comprising hundreds of sources in 46 natural and 13 programming languages (59 in total).
Journal ArticleDOI

GPT-NeoX-20B: An Open-Source Autoregressive Language Model

TL;DR: GPT-NeoX-20B is introduced, a 20 billion parameter autoregressive language model trained on the Pile, whose weights will be made freely and openly available to the public through a permissive license.
Journal ArticleDOI

A Survey of Large Language Models

TL;DR: Recently, a large language model (LLM) as mentioned in this paper has been proposed by pre-training Transformer models over large-scale corpora, showing strong capabilities in solving various NLP tasks.
References
More filters

VECO: Variable Encoder-decoder Pre-training for Cross-lingual Understanding and Generation

TL;DR: A variable encoder-decoder (VECO) pre-training approach to unify the two mainstreams in both model architectures and pre- training tasks, which delivers new state-of-the-art results on various cross-lingual understanding tasks of the XTREME benchmark.
Posted Content

PTT5: Pretraining and validating the T5 model on Brazilian Portuguese data.

TL;DR: This paper pretrain a T5 model on the BrWac corpus, an extensive collection of web pages in Portuguese, and evaluates its performance against other Portuguese pretrained models and multilingual models on the sentence similarity and sentence entailment tasks.
Posted Content

Unifying Question Answering and Text Classification via Span Extraction.

TL;DR: A unified, span-extraction approach leads to superior or comparable performance in multi-task learning, low-data and supplementary supervised pretraining experiments on several text classification and question answering benchmarks.
Proceedings Article

Rethinking Embedding Coupling in Pre-trained Language Models

TL;DR: The authors re-evaluate the standard practice of sharing weights between input and output embeddings in state-of-the-art pre-trained language models and show that decoupled embedding provides increased modeling flexibility, allowing to significantly improve the efficiency of parameter allocation in the input embedding of multilingual models.
Posted Content

Exploring Fine-tuning Techniques for Pre-trained Cross-lingual Models via Continual Learning

TL;DR: The method achieves better performance than other fine-tuning baselines on zero-shot cross-lingual part-of-speech tagging and named entity recognition tasks and preserves the original cross- Lingual ability of the pre-trained model when the authors fine-tune it to downstream cross-lingsual tasks.
Related Papers (5)
Trending Questions (3)
ISINDEBELE text generation under NLP using MT5 tool

The paper does not specifically mention ISINDEBELE text generation using the MT5 tool. The paper introduces mT5, a multilingual variant of T5, and demonstrates its performance on multilingual benchmarks.

Isindebele text generation under NLP using MT5 tool

The paper does not mention specifically about Isindebele text generation using the MT5 tool.

A Massively Multilingual Pre-trained Text-to-Text Transformer?

The paper introduces mT5, a multilingual variant of T5, which is a massively multilingual pre-trained text-to-text transformer.