scispace - formally typeset
Open AccessProceedings ArticleDOI

URIEL and lang2vec: Representing languages as typological, geographical, and phylogenetic vectors

Reads0
Chats0
TLDR
The URIEL knowledge base for massively multilingual NLP and the lang2vec utility, which provides information-rich vector identifications of languages drawn from typological, geographical, and phylogenetic databases and normalized to have straightforward and consistent formats, naming, and semantics are introduced.
Abstract
We introduce the URIEL knowledge base for massively multilingual NLP and the lang2vec utility, which provides information-rich vector identifications of languages drawn from typological, geographical, and phylogenetic databases and normalized to have straightforward and consistent formats, naming, and semantics. The goal of URIEL and lang2vec is to enable multilingual NLP, especially on less-resourced languages and make possible types of experiments (especially but not exclusively related to NLP tasks) that are otherwise difficult or impossible due to the sparsity and incommensurability of the data sources. lang2vec vectors have been shown to reduce perplexity in multilingual language modeling, when compared to one-hot language identification vectors.

read more

Content maybe subject to copyright    Report

Citations
More filters
Proceedings ArticleDOI

From zero to hero: On the limitations of zero-shot language transfer with multilingual transformers

TL;DR: It is demonstrated that the inexpensive few-shot transfer (i.e., additional fine-tuning on a few target-language instances) is surprisingly effective across the board, warranting more research efforts reaching beyond the limiting zero-shot conditions.
Proceedings ArticleDOI

Learning Language Representations for Typology Prediction

TL;DR: Experiments show that the proposed method is able to infer not only syntactic, but also phonological and phonetic inventory features, and improves over a baseline that has access to information about the languages geographic and phylogenetic neighbors.
Proceedings ArticleDOI

On Difficulties of Cross-Lingual Transfer with Order Differences: A Case Study on Dependency Parsing

TL;DR: The authors compare encoders and decoders based on Recurrent Neural Networks (RNNs) and modified self-attentive architectures for cross-lingual transfer and show that RNN-based architectures transfer well to languages that are close to English and perform especially well on distant languages.
References
More filters
Proceedings ArticleDOI

Improving Vector Space Word Representations Using Multilingual Correlation

TL;DR: This paper argues that lexico-semantic content should additionally be invariant across languages and proposes a simple technique based on canonical correlation analysis (CCA) for incorporating multilingual evidence into vectors generated monolingually.
Proceedings ArticleDOI

Multilingual Models for Compositional Distributed Semantics

TL;DR: The authors leverage parallel data and learn to strongly align the embeddings of semantically equivalent sentences, while maintaining sufficient distance between those of dissimilar sentences, without relying on word alignments or any syntactic information.
Related Papers (5)