mT5: A Massively Multilingual Pre-trained Text-to-Text Transformer

doi:10.18653/V1/2021.NAACL-MAIN.41

Citations

PDF

Open Access

More filters

Proceedings Article•DOI•

Challenges and Strategies in Cross-Cultural NLP

[...]

Daniel Hershcovich, Stella Frank, Heather C. Lent, Miryam de Lhoneux, Mostafa Gharib Mostafa Abdou, Stephanie Brandl, Emanuele Bugliarello, Laura Cabello Piqueras, Ilias Chalkidis, Ruixiang Cui, Constanza Fierro, Katerina Margatina, Phillip Rust, Anders Søgaard - Show less +10 more

18 Mar 2022

TL;DR: The authors proposed a principled framework for cross-lingual and multilingual NLP, and surveyed existing and potential strategies to accommodate linguistic diversity and serve speakers of many different languages in NLP systems.

...read moreread less

Abstract: Various efforts in the Natural Language Processing (NLP) community have been made to accommodate linguistic diversity and serve speakers of many different languages. However, it is important to acknowledge that speakers and the content they produce and require, vary not just by language, but also by culture. Although language and culture are tightly linked, there are important differences. Analogous to cross-lingual and multilingual NLP, cross-cultural and multicultural NLP considers these differences in order to better serve users of NLP systems. We propose a principled framework to frame these efforts, and survey existing and potential strategies.

...read moreread less

16 citations

Proceedings Article•DOI•

Pretraining with Artificial Language: Studying Transferable Knowledge in Language Models

[...]

Ryokan Ri, Yoshimasa Tsuruoka

19 Mar 2022

TL;DR: The experimental results show that pretraining with an artificial language with a nesting dependency structure provides some knowledge transferable to natural language, and a follow-up probing analysis indicates that its success in the transfer is related to the amount of encoded contextual information.

...read moreread less

Abstract: We investigate what kind of structural knowledge learned in neural network encoders is transferable to processing natural language.We design artificial languages with structural properties that mimic natural language, pretrain encoders on the data, and see how much performance the encoder exhibits on downstream tasks in natural language.Our experimental results show that pretraining with an artificial language with a nesting dependency structure provides some knowledge transferable to natural language.A follow-up probing analysis indicates that its success in the transfer is related to the amount of encoded contextual information and what is transferred is the knowledge of position-aware context dependence of language.Our results provide insights into how neural network encoders process human languages and the source of cross-lingual transferability of recent multilingual language models.

...read moreread less

16 citations

Proceedings Article•DOI•

GEMv2: Multilingual NLG Benchmarking in a Single Line of Code

[...]

Sebastian Gehrmann, Abhik Bhattacharjee, Abinaya Mahendiran, Alex Wang, Alexandros Papangelis, Aman Madaan, Angelina McMillan-Major, Anna Shvets, Ashish Upadhyay, Bingsheng Yao, Bryan Wilie, Chandra Bhagavatula, Chaobin You, Craig Thomson, Cristina Gârbacea, Dakuo Wang, Daniel Deutsch, D. Y. Xing, Di Jin, Dimitra Gkatzia, Dragomir R. Radev, Elizabeth Clark, Esin Durmus, Faisal Ladhak, Filip Ginter, Genta Indra Winata, Hendrik Strobelt, Hiroaki Hayashi, Jekaterina Novikova, Jenna Kanerva, Jenny Chim, Jiawei Zhou, Jordan Clive, Joshua Maynez, João Sedoc, Juraj Juraska, Kaustubh Dhole, Khyathi Raghavi Chandu, Leonardo F. R. Ribeiro, Lewis C. Tunstall, Li Zhang, Mahima Pushkarna, Mathias Creutz, Michael White, Mihir Kale, Moussa Kamal Eddine, Nico Daheim, Nishant Subramani, Ondrej Dusek, Paul Pu Liang, Pawan Sasanka Ammanamanchi, Qinqin Zhu, Ratish Puduppully, Reno Kriz, Rifat Shahriyar, Ronald Cardenas, Saad Mahamood, Salomey Osei, S. vStajner, Sebastien Montella, Shailza, Shailza Jolly, Simon Mille, Tahmid Hasan, Tianhao Shen, Tosin P. Adewumi, Vikas Raunak, Vipul Raheja, Vitaly Nikolaev, Vivian Tsai, Yacine Jernite, Yi Xu, Yisi Sang, Yixin Liu, Yufang Hou - Show less +71 more

22 Jun 2022

TL;DR: The new version of the Generation, Evaluation, and Metrics Benchmark introduces GEMv2, which introduces a modular infrastructure for dataset, model, and metric developers to beneﬁt from each others work.

...read moreread less

Abstract: Evaluation in machine learning is usually informed by past choices, for example which datasets or metrics to use. This standardization enables the comparison on equal footing using leaderboards, but the evaluation choices become sub-optimal as better alternatives arise. This problem is especially pertinent in natural language generation which requires ever-improving suites of datasets, metrics, and human evaluation to make definitive claims. To make following best model evaluation practices easier, we introduce GEMv2. The new version of the Generation, Evaluation, and Metrics Benchmark introduces a modular infrastructure for dataset, model, and metric developers to benefit from each others work. GEMv2 supports 40 documented datasets in 51 languages. Models for all datasets can be evaluated online and our interactive data card creation and rendering tools make it easier to add new datasets to the living benchmark.

...read moreread less

16 citations

Journal Article•DOI•

Bridging the Gap Between Indexing and Retrieval for Differentiable Search Index with Query Generation

[...]

Shen Zhuang, Houxing Ren, Lin-Fei Shou, Jian Pei, Ming Gong, Guido Zuccon, Daxin Jiang - Show less +3 more

21 Jun 2022-arXiv.org

TL;DR: This paper identifies and tackles an important issue of current DSI models: the data distribution mismatch that occurs between the DSI indexing and retrieval processes, and proposes a simple yet effective indexing framework for DSI called DSI-QG.

...read moreread less

Abstract: The Differentiable Search Index (DSI) is an emerging paradigm for information retrieval. Unlike traditional retrieval architectures where index and retrieval are two different and separate components, DSI uses a single transformer model to perform both indexing and retrieval. In this paper, we identify and tackle an important issue of current DSI models: the data distribution mismatch that occurs between the DSI indexing and retrieval processes. Specifically, we argue that, at indexing, current DSI methods learn to build connections between the text of long documents and the identifier of the documents, but then retrieval of document identifiers is based on queries that are commonly much shorter than the indexed documents. This problem is further exacerbated when using DSI for cross-lingual retrieval, where document text and query text are in different languages. To address this fundamental problem of current DSI models, we propose a simple yet effective indexing framework for DSI, called DSI-QG. When indexing, DSI-QG represents documents with a number of potentially relevant queries generated by a query generation model and re-ranked and filtered by a cross-encoder ranker. The presence of these queries at indexing allows the DSI models to connect a document identifier to a set of queries, hence mitigating data distribution mismatches present between the indexing and the retrieval phases. Empirical results on popular mono-lingual and cross-lingual passage retrieval datasets show that DSI-QG significantly outperforms the original DSI model.

...read moreread less

16 citations

Proceedings Article•DOI•

BAD-X: Bilingual Adapters Improve Zero-Shot Cross-Lingual Transfer

[...]

Marinela Parovic, Goran Glavaš, Ivan Vulić, Anna Korhonen

01 Jan 2022

TL;DR: This work shows that it is more effective to learn bilingual language pair adapters (BAs) when the goal is to optimize performance for a particular source-target transfer direction, and trades off some modularity of dedicated LAs for improved transfer performance.

...read moreread less

Abstract: Adapter modules enable modular and efficient zero-shot cross-lingual transfer, where current state-of-the-art adapter-based approaches learn specialized language adapters (LAs) for individual languages. In this work, we show that it is more effective to learn bilingual language pair adapters (BAs) when the goal is to optimize performance for a particular source-target transfer direction. Our novel BAD-X adapter framework trades off some modularity of dedicated LAs for improved transfer performance: we demonstrate consistent gains in three standard downstream tasks, and for the majority of evaluated low-resource languages.

...read moreread less

16 citations

Collapse

mT5: A Massively Multilingual Pre-trained Text-to-Text Transformer

Citations

References

"mT5: A Massively Multilingual Pre-t..." refers methods in this paper

"mT5: A Massively Multilingual Pre-t..." refers background or methods in this paper

Related Papers (5)

Trending Questions (3)