mT5: A Massively Multilingual Pre-trained Text-to-Text Transformer
Linting Xue,Noah Constant,Adam Roberts,Mihir Kale,Rami Al-Rfou,Aditya Siddhant,Aditya Barua,Colin Raffel +7 more
- pp 483-498
Reads0
Chats0
TLDR
This paper proposed a multilingual variant of T5, mT5, which was pre-trained on a new Common Crawl-based dataset covering 101 languages and achieved state-of-the-art performance on many multilingual benchmarks.Abstract:
The recent “Text-to-Text Transfer Transformer” (T5) leveraged a unified text-to-text format and scale to attain state-of-the-art results on a wide variety of English-language NLP tasks. In this paper, we introduce mT5, a multilingual variant of T5 that was pre-trained on a new Common Crawl-based dataset covering 101 languages. We detail the design and modified training of mT5 and demonstrate its state-of-the-art performance on many multilingual benchmarks. We also describe a simple technique to prevent “accidental translation” in the zero-shot setting, where a generative model chooses to (partially) translate its prediction into the wrong language. All of the code and model checkpoints used in this work are publicly available.read more
Citations
More filters
Proceedings Article
Super-NaturalInstructions: Generalization via Declarative Instructions on 1600+ NLP Tasks
Yizhong Wang,Swaroop Mishra,Pegah Alipoormolabashi,Yeganeh Kordi,Amirreza Mirzaei,Anjana Arunkumar,Arjun Ashok,Arut Selvan Dhanasekaran,Atharva Naik,David Stap,Eshaan Pathak,Giannis Karamanolakis,Haizhi Gary Lai,Ishan Purohit,Ishani Mondal,Jacob Anderson,Kirby Kuznia,Krima Doshi,Maitreya Patel,Kuntal Kumar Pal,Mehrad Moradshahi,Mihir Parmar,Mirali Purohit,Neeraj Varshney,Phani Rohitha Kaza,Pulkit Verma,Ravsehaj Singh Puri,Rushang Vinod Karia,Shailaja Keyur Sampat,Savankumar Doshi,Siddharth Deepak Mishra,Sujan Reddy,Sumanta Patro,Tanay Dixit,Xudong Shen,Chitta Baral,Yejin Choi,Noah A. Smith,Hanna Hajishirzi,Daniel Khashabi +39 more
TL;DR: These data support the concept that proper ED evaluation can identify a large body of patients with trivial ingestions who may not require hospital observation and help generalize NLP models to a variety of unseen tasks.
Proceedings ArticleDOI
Taxonomy of Risks posed by Language Models
Laura Weidinger,Jonathan Uesato,Maribeth Rauh,C. Griffin,Po-Sen Huang,John F. J. Mellor,A. Glaese,M. Cheng,Borja Balle,Atoosa Kasirzadeh,Courtney Biles,Sande Minnich Brown,Zachary Kenton,William T. Hawkins,Thomas Stepleton,Abeba Birhane,Lisa Anne Hendricks,Laura Rimell,William S. Isaac,Julia Haas,Sean Legassick,Geoffrey Irving,Iason Gabriel +22 more
TL;DR: A comprehensive taxonomy of ethical and social risks associated with LMs is developed, drawing on expertise and literature from computer science, linguistics, and the social sciences to ensure that language models are developed responsibly.
Proceedings ArticleDOI
Unified Structure Generation for Universal Information Extraction
TL;DR: A unified text-to-structure generation framework, namely UIE, which can universally model different IE tasks, adaptively generate targeted structures, and collaboratively learn general IE abilities from different knowledge sources is proposed.
Posted Content
MKQA: A Linguistically Diverse Benchmark for Multilingual Open Domain Question Answering
TL;DR: Multilingual Knowledge Questions and Answers is introduced, an open- domain question answering evaluation set comprising 10k question-answer pairs aligned across 26 typologically diverse languages, making results comparable across languages and independent of language-specific passages.
Journal ArticleDOI
Scaling Up Models and Data with t5x and seqio
Adam Roberts,Hyung Won Chung,Anselm Levskaya,Gaurav Mishra,James Bradbury,Daniel Andor,Sharan Narang,Brian Lester,Colin Gaffney,Afroz Mohiuddin,Curtis Hawthorne,Aitor Lewkowycz,Alexandru D. Sălcianu,M. van Zee,Jacob Austin,Sebastian Goodman,Livio Soares,Haitang Hu,Sasha Tsvyashchenko,Aakanksha Chowdhery,Jasmijn Bastings,Jannis Bulian,Xavier Garcia,Jianmo Ni,A. Chen,Kathleen Kenealy,Jonathan H. Clark,Stephan G. Lee,Daniel H Garrette,James P. Lee-Thorp,Colin Raffel,Noam Shazeer,Marvin Ritter,Maarten Bosma,Alexandre Passos,Jeremy Maitin-Shepard,Noah Fiedel,Mark Omernick,Brennan Saeta,Ryan Sepassi,Alexander Spiridonov,Joshua Newlan,Andrea Gesmundo +42 more
TL;DR: Two software libraries are presented: t5x simplifies the process of building and training large language models at scale while maintaining ease of use, and seqio provides a task-based API for simple creation of fast and reproducible training data and evaluation pipelines.
References
More filters
Proceedings Article
Attention is All you Need
Ashish Vaswani,Noam Shazeer,Niki Parmar,Jakob Uszkoreit,Llion Jones,Aidan N. Gomez,Lukasz Kaiser,Illia Polosukhin +7 more
TL;DR: This paper proposed a simple network architecture based solely on an attention mechanism, dispensing with recurrence and convolutions entirely and achieved state-of-the-art performance on English-to-French translation.
Posted Content
RoBERTa: A Robustly Optimized BERT Pretraining Approach
Yinhan Liu,Myle Ott,Naman Goyal,Jingfei Du,Mandar Joshi,Danqi Chen,Omer Levy,Michael Lewis,Luke Zettlemoyer,Veselin Stoyanov +9 more
TL;DR: It is found that BERT was significantly undertrained, and can match or exceed the performance of every model published after it, and the best model achieves state-of-the-art results on GLUE, RACE and SQuAD.
Proceedings ArticleDOI
SQuAD: 100,000+ Questions for Machine Comprehension of Text
TL;DR: The Stanford Question Answering Dataset (SQuAD) as mentioned in this paper is a reading comprehension dataset consisting of 100,000+ questions posed by crowdworkers on a set of Wikipedia articles, where the answer to each question is a segment of text from the corresponding reading passage.
Proceedings ArticleDOI
Unsupervised Cross-lingual Representation Learning at Scale
Alexis Conneau,Kartikay Khandelwal,Naman Goyal,Vishrav Chaudhary,Guillaume Wenzek,Francisco Guzmán,Edouard Grave,Myle Ott,Luke Zettlemoyer,Veselin Stoyanov +9 more
TL;DR: It is shown that pretraining multilingual language models at scale leads to significant performance gains for a wide range of cross-lingual transfer tasks, and the possibility of multilingual modeling without sacrificing per-language performance is shown for the first time.
Proceedings ArticleDOI
Universal Language Model Fine-tuning for Text Classification
Jeremy Howard,Sebastian Ruder +1 more
TL;DR: Universal Language Model Fine-tuning (ULMFiT) as mentioned in this paper is an effective transfer learning method that can be applied to any task in NLP, and introduces techniques that are key for finetuning a language model.