scispace - formally typeset
Open AccessProceedings ArticleDOI

How Good is Your Tokenizer? On the Monolingual Performance of Multilingual Language Models

TLDR
The authors provide a systematic and comprehensive empirical comparison of pretrained multilingual language models versus their monolingual counterparts with regard to their monolinguistic task performance, and find that while the pretraining data size is an important factor in the downstream performance, a designated mon-olingual tokenizer plays an equally important role in downstream performance.
Abstract
In this work, we provide a systematic and comprehensive empirical comparison of pretrained multilingual language models versus their monolingual counterparts with regard to their monolingual task performance. We study a set of nine typologically diverse languages with readily available pretrained monolingual models on a set of five diverse monolingual downstream tasks. We first aim to establish, via fair and controlled comparisons, if a gap between the multilingual and the corresponding monolingual representation of that language exists, and subsequently investigate the reason for any performance difference. To disentangle conflating factors, we train new monolingual models on the same data, with monolingually and multilingually trained tokenizers. We find that while the pretraining data size is an important factor, a designated monolingual tokenizer plays an equally important role in the downstream performance. Our results show that languages that are adequately represented in the multilingual model’s vocabulary exhibit negligible performance decreases over their monolingual counterparts. We further find that replacing the original multilingual tokenizer with the specialized monolingual tokenizer improves the downstream performance of the multilingual model for almost every task and language.

read more

Content maybe subject to copyright    Report

Citations
More filters
Journal ArticleDOI

Multilingual text categorization and sentiment analysis: a comparative analysis of the utilization of multilingual approaches for classifying twitter data

TL;DR: In this article , a comparative analysis of multilingual approaches for classifying both the sentiment and the text of an examined multilingual corpus was performed and four multilingual BERT-based classifiers and a zero-shot classification approach were utilized and compared in terms of their accuracy and applicability in the classification of multinational data.
Posted Content

Specializing Multilingual Language Models: An Empirical Study

TL;DR: This article study the performance, extensibility, and interaction of two such adaptations for this low-resource setting: vocabulary augmentation and script transliteration, and they yield a mixed result, upholding the viability of these approaches while raising new questions around how to optimally adapt multilingual models to lowresource settings.
Posted Content

BanglaBERT: Combating Embedding Barrier in Multilingual Models for Low-Resource Language Understanding

TL;DR: The BanglaBERT model as mentioned in this paper proposed a straightforward solution by transcribing languages to a common script, which can effectively improve the performance of a multilingual model for the Bangla language.
Proceedings Article

Code-switched inspired losses for spoken dialog representations.

TL;DR: The authors introduce new pretraining losses tailored to learn generic multilingual spoken dialogue representations, which expose the model to code-switched language. But their experiments show that their new losses achieve a better performance in both monolingual and multilingual settings.
Proceedings ArticleDOI

Vietnamese Sentiment Analysis: An Overview and Comparative Study of Fine-tuning Pretrained Language Models

TL;DR: In this paper , a fine-tuning approach to investigate the performance of different pre-trained language models for the Vietnamese Sentiment Analysis (SA) task is presented, and the experimental results show the superior performance of the monolingual PhoBERT model and ViT5 model in comparison with previous studies.
References
More filters
Proceedings Article

Adam: A Method for Stochastic Optimization

TL;DR: This work introduces Adam, an algorithm for first-order gradient-based optimization of stochastic objective functions, based on adaptive estimates of lower-order moments, and provides a regret bound on the convergence rate that is comparable to the best known results under the online convex optimization framework.
Proceedings Article

Attention is All you Need

TL;DR: This paper proposed a simple network architecture based solely on an attention mechanism, dispensing with recurrence and convolutions entirely and achieved state-of-the-art performance on English-to-French translation.
Proceedings ArticleDOI

BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding

TL;DR: BERT as mentioned in this paper pre-trains deep bidirectional representations from unlabeled text by jointly conditioning on both left and right context in all layers, which can be fine-tuned with just one additional output layer to create state-of-the-art models for a wide range of tasks.
Posted Content

RoBERTa: A Robustly Optimized BERT Pretraining Approach

TL;DR: It is found that BERT was significantly undertrained, and can match or exceed the performance of every model published after it, and the best model achieves state-of-the-art results on GLUE, RACE and SQuAD.
Proceedings ArticleDOI

Deep contextualized word representations

TL;DR: This paper introduced a new type of deep contextualized word representation that models both complex characteristics of word use (e.g., syntax and semantics), and how these uses vary across linguistic contexts (i.e., to model polysemy).
Related Papers (5)