scispace - formally typeset
Open AccessPosted Content

WikiBERT models: deep transfer learning for many languages

TLDR
A simple, fully automated pipeline for creating language-specific BERT models from Wikipedia data is introduced and 42 new such models are introduced, most for languages up to now lacking dedicated deep neural language models.
Abstract
Deep neural language models such as BERT have enabled substantial recent advances in many natural language processing tasks. Due to the effort and computational cost involved in their pre-training, language-specific models are typically introduced only for a small number of high-resource languages such as English. While multilingual models covering large numbers of languages are available, recent work suggests monolingual training can produce better models, and our understanding of the tradeoffs between mono- and multilingual training is incomplete. In this paper, we introduce a simple, fully automated pipeline for creating language-specific BERT models from Wikipedia data and introduce 42 new such models, most for languages up to now lacking dedicated deep neural language models. We assess the merits of these models using the state-of-the-art UDify parser on Universal Dependencies data, contrasting performance with results using the multilingual BERT model. We find that UDify using WikiBERT models outperforms the parser using mBERT on average, with the language-specific models showing substantially improved performance for some languages, yet limited improvement or a decrease in performance for others. We also present preliminary results as first steps toward an understanding of the conditions under which language-specific models are most beneficial. All of the methods and models introduced in this work are available under open licenses from this https URL.

read more

Citations
More filters
Proceedings ArticleDOI

ARBERT & MARBERT: Deep Bidirectional Transformers for Arabic

TL;DR: The authors introduced two powerful deep bidirectional transformer-based models, ARBERT and MARBERT, for multi-dialectal Arabic language understanding evaluation, which achieved state-of-the-art results across the majority of tasks (37 out of 48 classification tasks, on the 42 datasets).
Posted Content

ARBERT & MARBERT: Deep Bidirectional Transformers for Arabic.

TL;DR: The authors introduced two powerful deep bidirectional transformer-based models, ARBERT and MARBERT, for multi-dialectal Arabic language understanding evaluation, which achieved state-of-the-art results across the majority of tasks (37 out of 48 classification tasks, on the 42 datasets).
Proceedings ArticleDOI

How Good is Your Tokenizer? On the Monolingual Performance of Multilingual Language Models

TL;DR: The authors provide a systematic and comprehensive empirical comparison of pretrained multilingual language models versus their monolingual counterparts with regard to their monolinguistic task performance, and find that while the pretraining data size is an important factor in the downstream performance, a designated mon-olingual tokenizer plays an equally important role in downstream performance.
Posted Content

How Good is Your Tokenizer? On the Monolingual Performance of Multilingual Language Models.

TL;DR: This paper provided a systematic and comprehensive empirical comparison of pretrained multilingual language models versus their monolingual counterparts with regard to their monolinguistic task performance, and found that while the pretraining data size is an important factor in the downstream performance of the multilingual model, a designated mon-olingual tokenizer plays an equally important role in downstream performance.
Posted Content

EstBERT: A Pretrained Language-Specific BERT for Estonian.

TL;DR: EstBERT, a large pretrained transformer-based language-specific BERT model for Estonian, is presented and the models’ results based on the finetuned EstBERT for multiple NLP tasks, including POS and morphological tagging, dependency parsing, named entity recognition and text classification are presented.
References
More filters
Proceedings Article

Attention is All you Need

TL;DR: This paper proposed a simple network architecture based solely on an attention mechanism, dispensing with recurrence and convolutions entirely and achieved state-of-the-art performance on English-to-French translation.
Proceedings ArticleDOI

Glove: Global Vectors for Word Representation

TL;DR: A new global logbilinear regression model that combines the advantages of the two major model families in the literature: global matrix factorization and local context window methods and produces a vector space with meaningful substructure.
Proceedings ArticleDOI

BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding

TL;DR: BERT as mentioned in this paper pre-trains deep bidirectional representations from unlabeled text by jointly conditioning on both left and right context in all layers, which can be fine-tuned with just one additional output layer to create state-of-the-art models for a wide range of tasks.
Posted Content

Efficient Estimation of Word Representations in Vector Space

TL;DR: This paper proposed two novel model architectures for computing continuous vector representations of words from very large data sets, and the quality of these representations is measured in a word similarity task and the results are compared to the previously best performing techniques based on different types of neural networks.
Posted Content

RoBERTa: A Robustly Optimized BERT Pretraining Approach

TL;DR: It is found that BERT was significantly undertrained, and can match or exceed the performance of every model published after it, and the best model achieves state-of-the-art results on GLUE, RACE and SQuAD.
Related Papers (5)