SemEval-2017 Task 2: Multilingual and Cross-lingual Semantic Word Similarity

doi:10.18653/V1/S17-2002

Open AccessProceedings ArticleDOI

SemEval-2017 Task 2: Multilingual and Cross-lingual Semantic Word Similarity

- pp 15-26

TLDR

Results show that systems that combine statistical knowledge from text corpora, in the form of word embeddings, and external knowledge from lexical resources are best performers in both subtasks.

Abstract:

This paper introduces a new task on Multilingual and Cross-lingual Semantic Word Similarity which measures the semantic similarity of word pairs within and across five languages: English, Farsi, German, Italian and Spanish. High quality datasets were manually curated for the five languages with high inter-annotator agreements (consistently in the 0.9 ballpark). These were used for semi-automatic construction of ten cross-lingual datasets. 17 teams participated in the task, submitting 24 systems in subtask 1 and 14 systems in subtask 2. Results show that systems that combine statistical knowledge from text corpora, in the form of word embeddings, and external knowledge from lexical resources are best performers in both subtasks. More information can be found on the task website: http://alt.qcri. org/semeval2017/task2.

Citations

PDF

Open Access

More filters

Posted Content

Cross-lingual Language Model Pretraining.

Guillaume Lample, +1 more

- 22 Jan 2019 -

arXiv: Computation and Language

TL;DR: This work proposes two methods to learn cross-lingual language models (XLMs): one unsupervised that only relies on monolingual data, and one supervised that leverages parallel data with a new cross-lingsual language model objective.

...read moreread less

Proceedings Article

Word translation without parallel data

Guillaume Lample, +4 more

TL;DR: It is shown that a bilingual dictionary can be built between two languages without using any parallel corpora, by aligning monolingual word embedding spaces in an unsupervised way.

...read moreread less

Proceedings Article

Cross-lingual Language Model Pretraining

Alexis Conneau, +1 more

TL;DR: This paper proposed two methods to learn cross-lingual language models (XLMs): one unsupervised that only relies on monolingual data, and one supervised that leverages parallel data with a new cross-language language model objective.

...read moreread less

Posted Content

Word Translation Without Parallel Data

Alexis Conneau, +4 more

- 11 Oct 2017 -

arXiv: Computation and Language

TL;DR: The authors aligns monolingual word embedding spaces in an unsupervised way without using any character information, and show that their model even outperforms existing supervised methods on cross-lingual tasks for some language pairs.

...read moreread less

Proceedings ArticleDOI

On the Limitations of Unsupervised Bilingual Dictionary Induction

Anders Søgaard, +2 more

TL;DR: This article showed that weak supervision from identical words enables more robust dictionary induction and established a near-perfect correlation between unsupervised bilingual dictionary induction performance and a previously unexplored graph similarity metric.

...read moreread less

Collapse

References

PDF

Open Access

More filters

Proceedings ArticleDOI

Glove: Global Vectors for Word Representation

Jeffrey Pennington, +2 more

TL;DR: A new global logbilinear regression model that combines the advantages of the two major model families in the literature: global matrix factorization and local context window methods and produces a vector space with meaningful substructure.

...read moreread less

Posted Content

Efficient Estimation of Word Representations in Vector Space

Tomas Mikolov, +3 more

- 16 Jan 2013 -

arXiv: Computation and Language

TL;DR: This paper proposed two novel model architectures for computing continuous vector representations of words from very large data sets, and the quality of these representations is measured in a word similarity task and the results are compared to the previously best performing techniques based on different types of neural networks.

...read moreread less

Proceedings Article

Efficient Estimation of Word Representations in Vector Space

Tomas Mikolov, +3 more

TL;DR: Two novel model architectures for computing continuous vector representations of words from very large data sets are proposed and it is shown that these vectors provide state-of-the-art performance on the authors' test set for measuring syntactic and semantic word similarities.

...read moreread less

Posted Content

Using Information Content to Evaluate Semantic Similarity in a Taxonomy

Philip Resnik

- 29 Nov 1995 -

arXiv: Computation and Language

TL;DR: In this article, a new measure of semantic similarity in an IS-A taxonomy based on the notion of information content is presented, and experimental evaluation suggests that the measure performs encouragingly well (a correlation of r = 0.79 with a benchmark set of human similarity judgments, with an upper bound of r < 0.90 for human subjects performing the same task).

...read moreread less

Journal ArticleDOI

From frequency to meaning: vector space models of semantics

Peter D. Turney, +1 more

- 01 Jan 2010 -

Journal of Artificial Intelligence Resea...

TL;DR: The goal in this survey is to show the breadth of applications of VSMs for semantics, to provide a new perspective on VSMs, and to provide pointers into the literature for those who are less familiar with the field.

...read moreread less