scispace - formally typeset
Search or ask a question
Topic

Semantic similarity

About: Semantic similarity is a research topic. Over the lifetime, 14605 publications have been published within this topic receiving 364659 citations. The topic is also known as: semantic relatedness.


Papers
More filters
Journal ArticleDOI
TL;DR: A comprehensive survey of existing systems in place for new researchers to experiment and develop innovative ideas to address the issue of semantic similarity can be found in this article, where the authors discuss the strengths and weaknesses of each method.
Abstract: Estimating the semantic similarity between text data is one of the challenging and open research problems in the field of Natural Language Processing (NLP). The versatility of natural language makes it difficult to define rule-based methods for determining semantic similarity measures. To address this issue, various semantic similarity methods have been proposed over the years. This survey article traces the evolution of such methods beginning from traditional NLP techniques such as kernel-based methods to the most recent research work on transformer-based models, categorizing them based on their underlying principles as knowledge-based, corpus-based, deep neural network–based methods, and hybrid methods. Discussing the strengths and weaknesses of each method, this survey provides a comprehensive view of existing systems in place for new researchers to experiment and develop innovative ideas to address the issue of semantic similarity.

74 citations

Patent
Yael Karov1, Micha Yochanan Breakstone1, Reshef Shilon1, Orgad Keller1, Eric Shellef1 
25 Jul 2014
TL;DR: In this paper, a computing device has a semantic compiler to generate a semantic model based on a corpus of sample requests, and the semantic compiler may generate the semantic model by extracting contextual semantic features or processing ontologies.
Abstract: Technologies for natural language request processing include a computing device having a semantic compiler to generate a semantic model based on a corpus of sample requests. The semantic compiler may generate the semantic model by extracting contextual semantic features or processing ontologies. The computing device generates a semantic representation of a natural language request by generating a lattice of candidate alternative representations, assigning a composite weight to each candidate, and finding the best route through the lattice. The composite weight may include semantic weights, phonetic weights, and/or linguistic weights. The semantic representation identifies a user intent and slots associated with the natural language request. The computing device may perform one or more dialog interactions based on the semantic request, including generating a request for additional information or suggesting additional user intents. The computing device may support automated analysis and tuning to improve request processing. Other embodiments are described and claimed.

74 citations

Journal ArticleDOI
TL;DR: Two studies investigating the computer-based representation of the semantic information content of databases using object location in two- and three-dimensional virtual space supported the conclusion that, for the purpose of information search, the amount of additional semantic information that can be conveyed by a three- dimensional solution does not outweigh the associated additional cognitive demands.
Abstract: This paper reports two studies investigating the computer-based representation of the semantic information content of databases using object location in two- and three-dimensional virtual space. In the first study, the cognitive demands associated with performing an information search task were examined under conditions where the “goodness of fit” of the spatial-semantic “mapping” was manipulated. The effects of individual differences in spatial ability and associative memory ability also were considered. Results indicated that performance equivalence, between two- and three-dimensional interfaces, could be achieved when the two-dimensional interface accounted for between 50 and 70% of the semantic variance accounted for by the three-dimensional solution. A second study, in which automatic text analysis was used to generate two- and three-dimensional solutions for document sets of varying sizes and types, supported the conclusion that, for the purpose of information search, the amount of additional semantic information that can be conveyed by a three-dimensional solution does not outweigh the associated additional cognitive demands.

74 citations

Book ChapterDOI
11 Oct 2015
TL;DR: Klink-2 is presented, a novel approach which improves on earlier work on automatic generation of semantic topic networks and addresses the aforementioned limitations by taking advantage of a variety of knowledge sources available on the web.
Abstract: The amount of scholarly data available on the web is steadily increasing, enabling different types of analytics which can provide important insights into the research activity. In order to make sense of and explore this large-scale body of knowledge we need an accurate, comprehensive and up-to-date ontology of research topics. Unfortunately, human crafted classifications do not satisfy these criteria, as they evolve too slowly and tend to be too coarse-grained. Current automated methods for generating ontologies of research areas also present a number of limitations, such as: i they do not consider the rich amount of indirect statistical and semantic relationships, which can help to understand the relation between two topics --- e.g., the fact that two research areas are associated with a similar set of venues or technologies; ii they do not distinguish between different kinds of hierarchical relationships; and iii they are not able to handle effectively ambiguous topics characterized by a noisy set of relationships. In this paper we present Klink-2, a novel approach which improves on our earlier work on automatic generation of semantic topic networks and addresses the aforementioned limitations by taking advantage of a variety of knowledge sources available on the web. In particular, Klink-2 analyses networks of research entities including papers, authors, venues, and technologies to infer three kinds of semantic relationships between topics. It also identifies ambiguous keywords e.g., "ontology" and separates them into the appropriate distinct topics --- e.g., "ontology/philosophy" vs. "ontology/semantic web". Our experimental evaluation shows that the ability of Klink-2 to integrate a high number of data sources and to generate topics with accurate contextual meaning yields significant improvements over other algorithms in terms of both precision and recall.

74 citations

Proceedings Article
01 Jun 2008
TL;DR: A sentence quotation graph is built that captures the conversation structure among emails and three cohesion measures are adopted: clue words, semantic similarity and cosine similarity as the weight of the edges.
Abstract: In this paper, we study the problem of summarizing email conversations. We first build a sentence quotation graph that captures the conversation structure among emails. We adopt three cohesion measures: clue words, semantic similarity and cosine similarity as the weight of the edges. Second, we use two graph-based summarization approaches, Generalized ClueWordSummarizer and PageRank, to extract sentences as summaries. Third, we propose a summarization approach based on subjective opinions and integrate it with the graph-based ones. The empirical evaluation shows that the basic clue words have the highest accuracy among the three cohesion measures. Moreover, subjective words can significantly improve accuracy.

74 citations


Network Information
Related Topics (5)
Web page
50.3K papers, 975.1K citations
84% related
Graph (abstract data type)
69.9K papers, 1.2M citations
84% related
Unsupervised learning
22.7K papers, 1M citations
83% related
Feature vector
48.8K papers, 954.4K citations
83% related
Web service
57.6K papers, 989K citations
82% related
Performance
Metrics
No. of papers in the topic in previous years
YearPapers
2023202
2022522
2021641
2020837
2019866
2018787