mT5: A Massively Multilingual Pre-trained Text-to-Text Transformer
Linting Xue,Noah Constant,Adam Roberts,Mihir Kale,Rami Al-Rfou,Aditya Siddhant,Aditya Barua,Colin Raffel +7 more
- pp 483-498
TLDR
This paper proposed a multilingual variant of T5, mT5, which was pre-trained on a new Common Crawl-based dataset covering 101 languages and achieved state-of-the-art performance on many multilingual benchmarks.Abstract:
The recent “Text-to-Text Transfer Transformer” (T5) leveraged a unified text-to-text format and scale to attain state-of-the-art results on a wide variety of English-language NLP tasks. In this paper, we introduce mT5, a multilingual variant of T5 that was pre-trained on a new Common Crawl-based dataset covering 101 languages. We detail the design and modified training of mT5 and demonstrate its state-of-the-art performance on many multilingual benchmarks. We also describe a simple technique to prevent “accidental translation” in the zero-shot setting, where a generative model chooses to (partially) translate its prediction into the wrong language. All of the code and model checkpoints used in this work are publicly available.read more
Citations
More filters
Journal Article
Unifying Language Learning Paradigms
Yi Tay,Mostafa Dehghani,Vinh Q. Tran,Xavier Garcia,Dara Bahri,T. Schuster,Huaixiu Zheng,Neil Houlsby,Donald Metzler +8 more
TL;DR: UL2 achieves SOTA performance on 50 well-established supervised NLP tasks ranging from language generation, language understanding, text classification, question answering, commonsense reasoning, long text reasoning, structured knowledge grounding and information retrieval.
Posted Content
On the Opportunities and Risks of Foundation Models.
Rishi Bommasani,Drew A. Hudson,Ehsan Adeli,Russ B. Altman,Simran Arora,Sydney von Arx,Michael S. Bernstein,Jeannette Bohg,Antoine Bosselut,Emma Brunskill,Erik Brynjolfsson,Shyamal Buch,Dallas Card,Rodrigo Castellon,Niladri S. Chatterji,Annie Chen,Kathleen Creel,Jared Davis,Dora Demszky,Chris Donahue,Moussa Doumbouya,Esin Durmus,Stefano Ermon,John Etchemendy,Kawin Ethayarajh,Li Fei-Fei,Chelsea Finn,Trevor Gale,Lauren Gillespie,Karan Goel,Noah D. Goodman,Shelby Grossman,Neel Guha,Tatsunori Hashimoto,Peter Henderson,John Hewitt,Daniel E. Ho,Jenny Hong,Kyle Hsu,Jing Huang,Thomas Icard,Saahil Jain,Dan Jurafsky,Pratyusha Kalluri,Siddharth Karamcheti,Geoff Keeling,Fereshte Khani,Omar Khattab,Pang Wei Koh,Mark Krass,Ranjay Krishna,Rohith Kuditipudi,Ananya Kumar,Faisal Ladhak,Mina Lee,Tony Lee,Jure Leskovec,Isabelle Levent,Xiang Lisa Li,Xuechen Li,Tengyu Ma,Ali Ahmad Malik,Christopher D. Manning,Suvir Mirchandani,Eric Mitchell,Zanele Munyikwa,Suraj Nair,Avanika Narayan,Deepak Narayanan,Ben Newman,Allen Nie,Juan Carlos Niebles,Hamed Nilforoshan,Julian Nyarko,Giray Ogut,Laurel Orr,Isabel Papadimitriou,Joon Sung Park,Chris Piech,Eva Portelance,Christopher Potts,Aditi Raghunathan,Rob Reich,Hongyu Ren,Frieda Rong,Yusuf H. Roohani,Camilo Ruiz,Jack Ryan,Christopher Ré,Dorsa Sadigh,Shiori Sagawa,Keshav Santhanam,Andy Shih,Krishnan Srinivasan,Alex Tamkin,Rohan Taori,Armin W. Thomas,Florian Tramèr,Rose E. Wang,William Yang Wang,Bohan Wu,Jiajun Wu,Yuhuai Wu,Sang Michael Xie,Michihiro Yasunaga,Jiaxuan You,Matei Zaharia,Michael Zhang,Tianyi Zhang,Xikun Zhang,Yuhui Zhang,Lucia Zheng,Kaitlyn Zhou,Percy Liang +113 more
TL;DR: The authors provides a thorough account of the opportunities and risks of foundation models, ranging from their capabilities (e.g., language, vision, robotics, reasoning, human interaction) and technical principles(e. g.g. model architectures, training procedures, data, systems, security, evaluation, theory) to their applications.
Journal ArticleDOI
PaLM 2 Technical Report
Rohan Anil,Andrew M. Dai,Orhan Firat,Melvin George Johnson,Dmitry Lepikhin,Alexandre Passos,Siamak Shakeri,Emanuel Taropa,Paige Bailey,Zhi Chen,Eric Chu,Jonathan H. Clark,Laurent El Shafey,Yanping Huang,Kathleen S. Meier-Hellstern,Gaurav Mishra,Erica Oliveira Moreira,Mark Omernick,Kevin Robinson,Sebastian Ruder,Yi Pei. Tay,Kefan Xiao,Yuanzhong Xu,Yujing Zhang,Gustavo Hernandez-Abrego,Junwhan Ahn,Jacob Austin,Paul Barham,Jan A. Botha,James Bradbury,Siddhartha Brahma,Kevin Michael Brooks,M. Catasta,Yongzhou Cheng,Colin Cherry,Christopher A. Choquette-Choo,Aakanksha Chowdhery,C Crepy,Shachi Dave,Mostafa Dehghani,Sunipa Dev,Jacob Devlin,M. D'iaz,Nan Du,Ethan Dyer,Vladimir Feinberg,Fan Feng,Markus Freitag,Xavier Garcia,Sebastian Gehrmann,Guy Gur-Ari,Steven Hand,Hadi Hashemi,Le Hou,Joshua Howland,Anren Hu,Jeffrey Hui,Jeremy Scott Hurwitz,Michael Isard,Abe Ittycheriah,Matthew Jagielski,Wenhao Jia,Kathleen Kenealy,Maxim Krikun,Sneha Kudugunta,Katherine Lee,Benjamin N. Lee,Eric Li,Mu Li-Li,Wei Li,Yaguang Li,Jian Li,Hyeontaek Lim,Han Lin,Zhong-Zhong Liu,Frederick Liu,Marcello Maggioni,Aroma Mahendru,Joshua Maynez,Vedant Misra,Maysam Moussalem,Zachary Nado,John Nham,Eric Ni,Andrew Nystrom,Alicia Parrish,Marie Pellat,Martin Polacek,Alex Polozov,Reiner Pope,Siyuan Qiao,Emily Reif,Parker Riley,Alexandra Ros,Aurko Roy,Brennan Saeta,Rajkumar Samuel,Renee Shelby,Ambrose Jay Slone,Daniel Smilkov,David R. So,Daniela Sohn,Simon Tokumine,Vijay K. Vasudevan,Kiran Vodrahalli,Xuezhi Wang,Pidong Wang,Tao Wang,John Wieting,Yuhuai Wu,Ke Xu,Yu Yu Xu,Lin Wu Xue,Pengcheng Yin,Jia Yu,Biao Zhang,Steven X.F. Zheng,Ce Zheng,Wei Zhou,Denny Zhou,Slav Petrov,Yonghui Wu +121 more
TL;DR: The PaLM 2 model as mentioned in this paper is a Transformer-based model trained using a mixture of objectives, which has better multilingual and reasoning capabilities and is more compute-efficient than its predecessor PaLM.
Proceedings ArticleDOI
Multi-Game Decision Transformers
Kuang-Huei Lee,Ofir Nachum,Meng Yang,L. Y. Lee,Daniel Freeman,Winnie Xu,Sergio Guadarrama,Ian Fischer,Eric Jang,Henryk Michalewski,Igor Mordatch +10 more
TL;DR: It is shown that a single transformer-based model – with a single set of weights – trained purely offline can play a suite of up to 46 Atari games simultaneously at close-to-human performance.
Journal ArticleDOI
Benchmarking Generalization via In-Context Instructions on 1, 600+ Language Tasks
Yizhong Wang,Swaroop Mishra,Pegah Alipoormolabashi,Yeganeh Kordi,Amirreza Mirzaei,Anjana Arunkumar,Arjun Ashok,Arut Selvan Dhanasekaran,Atharva Naik,David Stap,Eshaan Pathak,Giannis Karamanolakis,Haizhi Gary Lai,Ishan Purohit,Ishani Mondal,Jacob Anderson,Kirby Kuznia,Krima Doshi,Maitreya Patel,Kuntal Kumar Pal,Mehrad Moradshahi,Mihir Parmar,Mirali Purohit,Neeraj Varshney,Phani Rohitha Kaza,Pulkit Verma,Ravsehaj Singh Puri,Rushang Vinod Karia,Shailaja Keyur Sampat,Savankumar Doshi,Siddharth Deepak Mishra,Sujan Reddy,Sumanta Patro,Tanay Dixit,Xudong Shen,Chitta Baral,Yejin Choi,Hannaneh Hajishirzi,Noah A. Smith,Daniel Khashabi +39 more
TL;DR: This work introduces N ATURAL -I NSTRUCTIONS v 2, a collection of 1,600+ diverse language tasks and their expert written instructions that covers 70+ distinct task types, such as tagging, in-filling, and rewriting.
References
More filters
Proceedings Article
Attention is All you Need
Ashish Vaswani,Noam Shazeer,Niki Parmar,Jakob Uszkoreit,Llion Jones,Aidan N. Gomez,Lukasz Kaiser,Illia Polosukhin +7 more
TL;DR: This paper proposed a simple network architecture based solely on an attention mechanism, dispensing with recurrence and convolutions entirely and achieved state-of-the-art performance on English-to-French translation.
Posted Content
RoBERTa: A Robustly Optimized BERT Pretraining Approach
Yinhan Liu,Myle Ott,Naman Goyal,Jingfei Du,Mandar Joshi,Danqi Chen,Omer Levy,Michael Lewis,Luke Zettlemoyer,Veselin Stoyanov +9 more
TL;DR: It is found that BERT was significantly undertrained, and can match or exceed the performance of every model published after it, and the best model achieves state-of-the-art results on GLUE, RACE and SQuAD.
Proceedings ArticleDOI
SQuAD: 100,000+ Questions for Machine Comprehension of Text
TL;DR: The Stanford Question Answering Dataset (SQuAD) as mentioned in this paper is a reading comprehension dataset consisting of 100,000+ questions posed by crowdworkers on a set of Wikipedia articles, where the answer to each question is a segment of text from the corresponding reading passage.
Proceedings ArticleDOI
Unsupervised Cross-lingual Representation Learning at Scale
Alexis Conneau,Kartikay Khandelwal,Naman Goyal,Vishrav Chaudhary,Guillaume Wenzek,Francisco Guzmán,Edouard Grave,Myle Ott,Luke Zettlemoyer,Veselin Stoyanov +9 more
TL;DR: It is shown that pretraining multilingual language models at scale leads to significant performance gains for a wide range of cross-lingual transfer tasks, and the possibility of multilingual modeling without sacrificing per-language performance is shown for the first time.
Proceedings ArticleDOI
Universal Language Model Fine-tuning for Text Classification
Jeremy Howard,Sebastian Ruder +1 more
TL;DR: Universal Language Model Fine-tuning (ULMFiT) as mentioned in this paper is an effective transfer learning method that can be applied to any task in NLP, and introduces techniques that are key for finetuning a language model.