How Good is Your Tokenizer? On the Monolingual Performance of Multilingual Language Models
Phillip Rust,Jonas Pfeiffer,Ivan Vuli,Sebastian Ruder,Iryna Gurevych +4 more
- pp 3118-3135
TLDR
The authors provide a systematic and comprehensive empirical comparison of pretrained multilingual language models versus their monolingual counterparts with regard to their monolinguistic task performance, and find that while the pretraining data size is an important factor in the downstream performance, a designated mon-olingual tokenizer plays an equally important role in downstream performance.Abstract:
In this work, we provide a systematic and comprehensive empirical comparison of pretrained multilingual language models versus their monolingual counterparts with regard to their monolingual task performance. We study a set of nine typologically diverse languages with readily available pretrained monolingual models on a set of five diverse monolingual downstream tasks. We first aim to establish, via fair and controlled comparisons, if a gap between the multilingual and the corresponding monolingual representation of that language exists, and subsequently investigate the reason for any performance difference. To disentangle conflating factors, we train new monolingual models on the same data, with monolingually and multilingually trained tokenizers. We find that while the pretraining data size is an important factor, a designated monolingual tokenizer plays an equally important role in the downstream performance. Our results show that languages that are adequately represented in the multilingual model’s vocabulary exhibit negligible performance decreases over their monolingual counterparts. We further find that replacing the original multilingual tokenizer with the specialized monolingual tokenizer improves the downstream performance of the multilingual model for almost every task and language.read more
Citations
More filters
Journal ArticleDOI
Multilingual text categorization and sentiment analysis: a comparative analysis of the utilization of multilingual approaches for classifying twitter data
George Manias,Argyro Mavrogiorgou,Athanasios Kiourtis,Chrysostomos Symvoulidis,Dimosthenis Kyriazis +4 more
TL;DR: In this article , a comparative analysis of multilingual approaches for classifying both the sentiment and the text of an examined multilingual corpus was performed and four multilingual BERT-based classifiers and a zero-shot classification approach were utilized and compared in terms of their accuracy and applicability in the classification of multinational data.
Posted Content
Specializing Multilingual Language Models: An Empirical Study
Ethan C. Chau,Noah A. Smith +1 more
TL;DR: This article study the performance, extensibility, and interaction of two such adaptations for this low-resource setting: vocabulary augmentation and script transliteration, and they yield a mixed result, upholding the viability of these approaches while raising new questions around how to optimally adapt multilingual models to lowresource settings.
Posted Content
BanglaBERT: Combating Embedding Barrier in Multilingual Models for Low-Resource Language Understanding
Abhik Bhattacharjee,Tahmid Hasan,Kazi Samin,Saiful Islam,M. Sohel Rahman,Anindya Iqbal,Rifat Shahriyar +6 more
TL;DR: The BanglaBERT model as mentioned in this paper proposed a straightforward solution by transcribing languages to a common script, which can effectively improve the performance of a multilingual model for the Bangla language.
Proceedings Article
Code-switched inspired losses for spoken dialog representations.
TL;DR: The authors introduce new pretraining losses tailored to learn generic multilingual spoken dialogue representations, which expose the model to code-switched language. But their experiments show that their new losses achieve a better performance in both monolingual and multilingual settings.
Proceedings ArticleDOI
Vietnamese Sentiment Analysis: An Overview and Comparative Study of Fine-tuning Pretrained Language Models
TL;DR: In this paper , a fine-tuning approach to investigate the performance of different pre-trained language models for the Vietnamese Sentiment Analysis (SA) task is presented, and the experimental results show the superior performance of the monolingual PhoBERT model and ViT5 model in comparison with previous studies.
References
More filters
Proceedings Article
Adam: A Method for Stochastic Optimization
Diederik P. Kingma,Jimmy Ba +1 more
TL;DR: This work introduces Adam, an algorithm for first-order gradient-based optimization of stochastic objective functions, based on adaptive estimates of lower-order moments, and provides a regret bound on the convergence rate that is comparable to the best known results under the online convex optimization framework.
Proceedings Article
Attention is All you Need
Ashish Vaswani,Noam Shazeer,Niki Parmar,Jakob Uszkoreit,Llion Jones,Aidan N. Gomez,Lukasz Kaiser,Illia Polosukhin +7 more
TL;DR: This paper proposed a simple network architecture based solely on an attention mechanism, dispensing with recurrence and convolutions entirely and achieved state-of-the-art performance on English-to-French translation.
Proceedings ArticleDOI
BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding
TL;DR: BERT as mentioned in this paper pre-trains deep bidirectional representations from unlabeled text by jointly conditioning on both left and right context in all layers, which can be fine-tuned with just one additional output layer to create state-of-the-art models for a wide range of tasks.
Posted Content
RoBERTa: A Robustly Optimized BERT Pretraining Approach
Yinhan Liu,Myle Ott,Naman Goyal,Jingfei Du,Mandar Joshi,Danqi Chen,Omer Levy,Michael Lewis,Luke Zettlemoyer,Veselin Stoyanov +9 more
TL;DR: It is found that BERT was significantly undertrained, and can match or exceed the performance of every model published after it, and the best model achieves state-of-the-art results on GLUE, RACE and SQuAD.
Proceedings ArticleDOI
Deep contextualized word representations
Matthew E. Peters,Mark Neumann,Mohit Iyyer,Matt Gardner,Christopher Clark,Kenton Lee,Luke Zettlemoyer +6 more
TL;DR: This paper introduced a new type of deep contextualized word representation that models both complex characteristics of word use (e.g., syntax and semantics), and how these uses vary across linguistic contexts (i.e., to model polysemy).