Learning Word Vectors for 157 Languages

Open AccessProceedings Article

Learning Word Vectors for 157 Languages

Chats0

TLDR

This article used two sources of data to train these models: the free online encyclopedia Wikipedia and data from the common crawl project, and introduced three new word analogy datasets to evaluate these word vectors, for French, Hindi and Polish.

Abstract:

Distributed word representations, or word vectors, have recently been applied to many tasks in natural language processing, leading to state-of-the-art performance. A key ingredient to the successful application of these representations is to train them on very large corpora, and use these pre-trained models in downstream tasks. In this paper, we describe how we trained such high quality word representations for 157 languages. We used two sources of data to train these models: the free online encyclopedia Wikipedia and data from the common crawl project. We also introduce three new word analogy datasets to evaluate these word vectors, for French, Hindi and Polish. Finally, we evaluate our pre-trained word vectors on 10 languages for which evaluation datasets exists, showing very strong performance compared to previous models.

Citations

PDF

Open Access

More filters

Posted Content

Exploring the Limits of Transfer Learning with a Unified Text-to-Text Transformer

Colin Raffel, +8 more

- 23 Oct 2019 -

arXiv: Learning

TL;DR: This systematic study compares pre-training objectives, architectures, unlabeled datasets, transfer approaches, and other factors on dozens of language understanding tasks and achieves state-of-the-art results on many benchmarks covering summarization, question answering, text classification, and more.

...read moreread less

Proceedings ArticleDOI

Unsupervised Cross-lingual Representation Learning at Scale

Alexis Conneau, +9 more

TL;DR: It is shown that pretraining multilingual language models at scale leads to significant performance gains for a wide range of cross-lingual transfer tasks, and the possibility of multilingual modeling without sacrificing per-language performance is shown for the first time.

...read moreread less

Posted Content

BERTScore: Evaluating Text Generation with BERT

Tianyi Zhang, +4 more

- 21 Apr 2019 -

arXiv: Computation and Language

TL;DR: This work proposes BERTScore, an automatic evaluation metric for text generation that correlates better with human judgments and provides stronger model selection performance than existing metrics.

...read moreread less

Proceedings Article

Contextual String Embeddings for Sequence Labeling

Alan Akbik, +2 more

TL;DR: This paper proposes to leverage the internal states of a trained character language model to produce a novel type of word embedding which they refer to as contextual string embeddings, which are fundamentally model words as sequences of characters and are contextualized by their surrounding text.

...read moreread less

Proceedings Article

BERTScore: Evaluating Text Generation with BERT

Tianyi Zhang, +4 more

TL;DR: This article proposed BERTScore, an automatic evaluation metric for text generation, which computes a similarity score for each token in the candidate sentence with each token from the reference sentence. But instead of exact matches, they compute token similarity using contextual embeddings.

...read moreread less

Collapse

References

PDF

Open Access

More filters

Proceedings ArticleDOI

Glove: Global Vectors for Word Representation

Jeffrey Pennington, +2 more

TL;DR: A new global logbilinear regression model that combines the advantages of the two major model families in the literature: global matrix factorization and local context window methods and produces a vector space with meaningful substructure.

...read moreread less

Proceedings Article

Distributed Representations of Words and Phrases and their Compositionality

Tomas Mikolov, +4 more

TL;DR: This paper presents a simple method for finding phrases in text, and shows that learning good vector representations for millions of phrases is possible and describes a simple alternative to the hierarchical softmax called negative sampling.

...read moreread less

Posted Content

Efficient Estimation of Word Representations in Vector Space

Tomas Mikolov, +3 more

- 16 Jan 2013 -

arXiv: Computation and Language

TL;DR: This paper proposed two novel model architectures for computing continuous vector representations of words from very large data sets, and the quality of these representations is measured in a word similarity task and the results are compared to the previously best performing techniques based on different types of neural networks.

...read moreread less

Journal ArticleDOI

Enriching Word Vectors with Subword Information

Piotr Bojanowski, +3 more

- 12 Jun 2017 -

Transactions of the Association for Comp...

TL;DR: This paper proposed a new approach based on skip-gram model, where each word is represented as a bag of character n-grams, words being represented as the sum of these representations, allowing to train models on large corpora quickly and allowing to compute word representations for words that did not appear in the training data.

...read moreread less

Proceedings ArticleDOI

A unified architecture for natural language processing: deep neural networks with multitask learning

Ronan Collobert, +1 more

TL;DR: This work describes a single convolutional neural network architecture that, given a sentence, outputs a host of language processing predictions: part-of-speech tags, chunks, named entity tags, semantic roles, semantically similar words and the likelihood that the sentence makes sense using a language model.

...read moreread less

Collapse

arXiv: Computation and Language

Learning Word Vectors for 157 Languages

Citations

Exploring the Limits of Transfer Learning with a Unified Text-to-Text Transformer

Unsupervised Cross-lingual Representation Learning at Scale

BERTScore: Evaluating Text Generation with BERT

Contextual String Embeddings for Sequence Labeling

BERTScore: Evaluating Text Generation with BERT

References

Glove: Global Vectors for Word Representation

Distributed Representations of Words and Phrases and their Compositionality

Efficient Estimation of Word Representations in Vector Space

Enriching Word Vectors with Subword Information

A unified architecture for natural language processing: deep neural networks with multitask learning

Related Papers (5)

BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding

Glove: Global Vectors for Word Representation

Distributed Representations of Words and Phrases and their Compositionality

Attention is All you Need

Efficient Estimation of Word Representations in Vector Space