A Study on Similarity and Relatedness Using Distributional and WordNet-based Approaches

doi:10.3115/1620754.1620758

Open AccessProceedings ArticleDOI

A Study on Similarity and Relatedness Using Distributional and WordNet-based Approaches

- pp 19-27

TLDR

This paper presents and compares WordNet-based and distributional similarity approaches, and pioneer cross-lingual similarity, showing that the methods are easily adapted for a cross-lingsual task with minor losses.

Abstract:

This paper presents and compares WordNet-based and distributional similarity approaches. The strengths and weaknesses of each approach regarding similarity and relatedness tasks are discussed, and a combination is presented. Each of our methods independently provide the best results in their class on the RG and WordSim353 datasets, and a supervised combination of them yields the best published results on all datasets. Finally, we pioneer cross-lingual similarity, showing that our methods are easily adapted for a cross-lingual task with minor losses.

Citations

PDF

Open Access

More filters

Proceedings ArticleDOI

Don't count, predict! A systematic comparison of context-counting vs. context-predicting semantic vectors

Marco Baroni, +2 more

TL;DR: An extensive evaluation of context-predicting models with classic, count-vector-based distributional semantic approaches, on a wide range of lexical semantics tasks and across many parameter settings shows that the buzz around these models is fully justified.

...read moreread less

Journal ArticleDOI

Improving Distributional Similarity with Lessons Learned from Word Embeddings

Omer Levy, +2 more

- 04 May 2015 -

Transactions of the Association for Comp...

TL;DR: It is revealed that much of the performance gains of word embeddings are due to certain system design choices and hyperparameter optimizations, rather than the embedding algorithms themselves, and these modifications can be transferred to traditional distributional models, yielding similar gains.

...read moreread less

Journal ArticleDOI

Simlex-999: Evaluating semantic models with genuine similarity estimation

Felix Hill, +2 more

- 01 Dec 2015 -

Computational Linguistics

TL;DR: SimLex-999 is presented, a gold standard resource for evaluating distributional semantic models that improves on existing resources in several important ways, and explicitly quantifies similarity rather than association or relatedness so that pairs of entities that are associated but not actually similar have a low rating.

...read moreread less

Posted Content

SimLex-999: Evaluating Semantic Models with (Genuine) Similarity Estimation

Felix Hill, +2 more

- 15 Aug 2014 -

arXiv: Computation and Language

TL;DR: SimLex-999 as mentioned in this paper is a gold standard resource for evaluating distributional semantic models that improves on existing resources in several important ways, such as quantifying similarity rather than association or relatedness, so that pairs of entities that are associated but not actually similar have a low rating.

...read moreread less

Proceedings ArticleDOI

Dependency-Based Word Embeddings

Omer Levy, +1 more

TL;DR: The skip-gram model with negative sampling introduced by Mikolov et al. is generalized to include arbitrary contexts, and experiments with dependency-based contexts are performed, showing that they produce markedly different embeddings.

...read moreread less

Collapse

References

PDF

Open Access

More filters

Proceedings Article

An Information-Theoretic Definition of Similarity

Dekang Lin

TL;DR: This work presents an informationtheoretic definition of similarity that is applicable as long as there is a probabilistic model and demonstrates how this definition can be used to measure the similarity in a number of different domains.

...read moreread less

Posted Content

Using Information Content to Evaluate Semantic Similarity in a Taxonomy

Philip Resnik

- 29 Nov 1995 -

arXiv: Computation and Language

TL;DR: In this article, a new measure of semantic similarity in an IS-A taxonomy based on the notion of information content is presented, and experimental evaluation suggests that the measure performs encouragingly well (a correlation of r = 0.79 with a benchmark set of human similarity judgments, with an upper bound of r < 0.90 for human subjects performing the same task).

...read moreread less

Proceedings ArticleDOI

Verb semantics and lexical selection

Zhibiao Wu, +1 more

Abstract: This paper will focus on the semantic representation of verbs in computer systems and its impact on lexical selection problems in machine translation (MT). Two groups of English and Chinese verbs are examined to show that lexical selection must be based on interpretation of the sentences as well as selection restrictions placed on the verb arguments. A novel representation scheme is suggested, and is compared to representations with selection restrictions used in transfer-based MT. We see our approach as closely aligned with knowledge-based MT approaches (KBMT), and as a separate component that could be incorporated into existing systems. Examples and experimental results will show that, using this scheme, inexact matches can achieve correct lexical selection.

...read moreread less

Semantic Similarity Based on Corpus Statistics and Lexical Taxonomy

Jay J. Jiang, +1 more

TL;DR: This paper presents a new approach for measuring semantic similarity/distance between words and concepts that combines a lexical taxonomy structure with corpus statistical information so that the semantic distance between nodes in the semantic space constructed by the taxonomy can be better quantified with the computational evidence derived from a distributional analysis of corpus data.

...read moreread less

Proceedings Article

Computing semantic relatedness using Wikipedia-based explicit semantic analysis

Evgeniy Gabrilovich, +1 more

TL;DR: This work proposes Explicit Semantic Analysis (ESA), a novel method that represents the meaning of texts in a high-dimensional space of concepts derived from Wikipedia that results in substantial improvements in correlation of computed relatedness scores with human judgments.

...read moreread less