scispace - formally typeset
Search or ask a question
Topic

Semantic similarity

About: Semantic similarity is a research topic. Over the lifetime, 14605 publications have been published within this topic receiving 364659 citations. The topic is also known as: semantic relatedness.


Papers
More filters
Book
01 Jan 2007

190 citations

Proceedings Article
01 Aug 2013
TL;DR: This work presents a unified approach to semantic similarity that operates at multiple levels, all the way from comparing word senses to comparing text documents, and leverages a common probabilistic representation over word senses in order to compare different types of linguistic data.
Abstract: Semantic similarity is an essential component of many Natural Language Processing applications. However, prior methods for computing semantic similarity often operate at different levels, e.g., single words or entire documents, which requires adapting the method for each data type. We present a unified approach to semantic similarity that operates at multiple levels, all the way from comparing word senses to comparing text documents. Our method leverages a common probabilistic representation over word senses in order to compare different types of linguistic data. This unified representation shows state-ofthe-art performance on three tasks: semantic textual similarity, word similarity, and word sense coarsening.

189 citations

Patent
10 Aug 2001
TL;DR: In this paper, a method of order-ranking document clusters using entropy data and Bayesian self-organizing feature maps (SOM) is provided in which an accuracy of information retrieval is improved by adopting Bayesian SOM for performing a real-time document clustering for relevant documents.
Abstract: A method of order-ranking document clusters using entropy data and Bayesian self-organizing feature maps(SOM) is provided in which an accuracy of information retrieval is improved by adopting Bayesian SOM for performing a real-time document clustering for relevant documents in accordance with a degree of semantic similarity between entropy data extracted using entropy value and user profiles and query words given by a user, wherein the Bayesian SOM is a combination of Bayesian statistical technique and Kohonen network that is a type of an unsupervised learning.

188 citations

Journal ArticleDOI
TL;DR: A weight-based model and a learning procedure based on a novel median-based loss function designed to mitigate the negative effect of outliers are designed and found that the method outperforms the baseline approaches in the experiments, and that it generalizes well on different word embeddings without retraining.

188 citations

Proceedings ArticleDOI
02 Nov 2009
TL;DR: A novel similarity measure is proposed to leverage Wikipedia semantic knowledge for disambiguation, which surpasses other knowledge bases by the coverage of concepts, rich semantic information and up-to-date content and has been tested on the standard WePS data sets.
Abstract: Name ambiguity problem has raised an urgent demand for efficient, high-quality named entity disambiguation methods. The key problem of named entity disambiguation is to measure the similarity between occurrences of names. The traditional methods measure the similarity using the bag of words (BOW) model. The BOW, however, ignores all the semantic relations such as social relatedness between named entities, associative relatedness between concepts, polysemy and synonymy between key terms. So the BOW cannot reflect the actual similarity. Some research has investigated social networks as background knowledge for disambiguation. Social networks, however, can only capture the social relatedness between named entities, and often suffer the limited coverage problem. To overcome the previous methods' deficiencies, this paper proposes to use Wikipedia as the background knowledge for disambiguation, which surpasses other knowledge bases by the coverage of concepts, rich semantic information and up-to-date content. By leveraging Wikipedia's semantic knowledge like social relatedness between named entities and associative relatedness between concepts, we can measure the similarity between occurrences of names more accurately. In particular, we construct a large-scale semantic network from Wikipedia, in order that the semantic knowledge can be used efficiently and effectively. Based on the constructed semantic network, a novel similarity measure is proposed to leverage Wikipedia semantic knowledge for disambiguation. The proposed method has been tested on the standard WePS data sets. Empirical results show that the disambiguation performance of our method gets 10.7% improvement over the traditional BOW based methods and 16.7% improvement over the traditional social network based methods.

187 citations


Network Information
Related Topics (5)
Web page
50.3K papers, 975.1K citations
84% related
Graph (abstract data type)
69.9K papers, 1.2M citations
84% related
Unsupervised learning
22.7K papers, 1M citations
83% related
Feature vector
48.8K papers, 954.4K citations
83% related
Web service
57.6K papers, 989K citations
82% related
Performance
Metrics
No. of papers in the topic in previous years
YearPapers
2023202
2022522
2021641
2020837
2019866
2018787