scispace - formally typeset
Open AccessProceedings ArticleDOI

A Study on Similarity and Relatedness Using Distributional and WordNet-based Approaches

TLDR
This paper presents and compares WordNet-based and distributional similarity approaches, and pioneer cross-lingual similarity, showing that the methods are easily adapted for a cross-lingsual task with minor losses.
Abstract
This paper presents and compares WordNet-based and distributional similarity approaches. The strengths and weaknesses of each approach regarding similarity and relatedness tasks are discussed, and a combination is presented. Each of our methods independently provide the best results in their class on the RG and WordSim353 datasets, and a supervised combination of them yields the best published results on all datasets. Finally, we pioneer cross-lingual similarity, showing that our methods are easily adapted for a cross-lingual task with minor losses.

read more

Content maybe subject to copyright    Report

Citations
More filters
Proceedings ArticleDOI

Don't count, predict! A systematic comparison of context-counting vs. context-predicting semantic vectors

TL;DR: An extensive evaluation of context-predicting models with classic, count-vector-based distributional semantic approaches, on a wide range of lexical semantics tasks and across many parameter settings shows that the buzz around these models is fully justified.
Journal ArticleDOI

Improving Distributional Similarity with Lessons Learned from Word Embeddings

TL;DR: It is revealed that much of the performance gains of word embeddings are due to certain system design choices and hyperparameter optimizations, rather than the embedding algorithms themselves, and these modifications can be transferred to traditional distributional models, yielding similar gains.
Journal ArticleDOI

Simlex-999: Evaluating semantic models with genuine similarity estimation

TL;DR: SimLex-999 is presented, a gold standard resource for evaluating distributional semantic models that improves on existing resources in several important ways, and explicitly quantifies similarity rather than association or relatedness so that pairs of entities that are associated but not actually similar have a low rating.
Posted Content

SimLex-999: Evaluating Semantic Models with (Genuine) Similarity Estimation

TL;DR: SimLex-999 as mentioned in this paper is a gold standard resource for evaluating distributional semantic models that improves on existing resources in several important ways, such as quantifying similarity rather than association or relatedness, so that pairs of entities that are associated but not actually similar have a low rating.
Proceedings ArticleDOI

Dependency-Based Word Embeddings

TL;DR: The skip-gram model with negative sampling introduced by Mikolov et al. is generalized to include arbitrary contexts, and experiments with dependency-based contexts are performed, showing that they produce markedly different embeddings.
References
More filters
Proceedings Article

An Information-Theoretic Definition of Similarity

Dekang Lin
TL;DR: This work presents an informationtheoretic definition of similarity that is applicable as long as there is a probabilistic model and demonstrates how this definition can be used to measure the similarity in a number of different domains.
Posted Content

Using Information Content to Evaluate Semantic Similarity in a Taxonomy

TL;DR: In this article, a new measure of semantic similarity in an IS-A taxonomy based on the notion of information content is presented, and experimental evaluation suggests that the measure performs encouragingly well (a correlation of r = 0.79 with a benchmark set of human similarity judgments, with an upper bound of r < 0.90 for human subjects performing the same task).
Proceedings ArticleDOI

Verb semantics and lexical selection

Abstract: This paper will focus on the semantic representation of verbs in computer systems and its impact on lexical selection problems in machine translation (MT). Two groups of English and Chinese verbs are examined to show that lexical selection must be based on interpretation of the sentences as well as selection restrictions placed on the verb arguments. A novel representation scheme is suggested, and is compared to representations with selection restrictions used in transfer-based MT. We see our approach as closely aligned with knowledge-based MT approaches (KBMT), and as a separate component that could be incorporated into existing systems. Examples and experimental results will show that, using this scheme, inexact matches can achieve correct lexical selection.

Semantic Similarity Based on Corpus Statistics and Lexical Taxonomy

TL;DR: This paper presents a new approach for measuring semantic similarity/distance between words and concepts that combines a lexical taxonomy structure with corpus statistical information so that the semantic distance between nodes in the semantic space constructed by the taxonomy can be better quantified with the computational evidence derived from a distributional analysis of corpus data.
Proceedings Article

Computing semantic relatedness using Wikipedia-based explicit semantic analysis

TL;DR: This work proposes Explicit Semantic Analysis (ESA), a novel method that represents the meaning of texts in a high-dimensional space of concepts derived from Wikipedia that results in substantial improvements in correlation of computed relatedness scores with human judgments.
Related Papers (5)