scispace - formally typeset
Search or ask a question
Topic

Semantic similarity

About: Semantic similarity is a research topic. Over the lifetime, 14605 publications have been published within this topic receiving 364659 citations. The topic is also known as: semantic relatedness.


Papers
More filters
Journal ArticleDOI
TL;DR: The OWLS-MX as discussed by the authors is a hybrid Semantic Web service matchmaker for OWL-S services, which complements logic-based semantic matching with token-based syntactic similarity measurements in case the former fails.

302 citations

Proceedings Article
23 Jun 2011
TL;DR: A novel discriminative training method that projects the raw term vectors into a common, low-dimensional vector space, which not only outperforms existing state-of-the-art approaches, but also achieves high accuracy at low dimensions and is thus more efficient.
Abstract: Traditional text similarity measures consider each term similar only to itself and do not model semantic relatedness of terms. We propose a novel discriminative training method that projects the raw term vectors into a common, low-dimensional vector space. Our approach operates by finding the optimal matrix to minimize the loss of the pre-selected similarity function (e.g., cosine) of the projected vectors, and is able to efficiently handle a large number of training examples in the high-dimensional space. Evaluated on two very different tasks, cross-lingual document retrieval and ad relevance measure, our method not only outperforms existing state-of-the-art approaches, but also achieves high accuracy at low dimensions and is thus more efficient.

298 citations

Journal ArticleDOI
TL;DR: The proposed similarity measure soft similarity is a generalize of the well-known cosine similarity measure in VSM by introducing what it is called “soft cosine measure” and various formulas for exact or approximate calculation of the softcosine measure are proposed.
Abstract: We show how to consider similarity between features for calculation of similarity of objects in the Vector Space Model (VSM) for machine learning algorithms and other classes of methods that involve similarity between objects. Unlike LSA, we assume that similarity between features is known (say, from a synonym dictionary) and does not need to be learned from the data.We call the proposed similarity measure soft similarity. Similarity between features is common, for example, in natural language processing: words, n-grams, or syntactic n-grams can be somewhat different (which makes them different features) but still have much in common: for example, words “play” and “game” are different but related. When there is no similarity between features then our soft similarity measure is equal to the standard similarity. For this, we generalize the well-known cosine similarity measure in VSM by introducing what we call “soft cosine measure”. We propose various formulas for exact or approximate calculation of the soft cosine measure. For example, in one of them we consider for VSM a new feature space consisting of pairs of the original features weighted by their similarity. Again, for features that bear no similarity to each other, our formulas reduce to the standard cosine measure. Our experiments show that our soft cosine measure provides better performance in our case study: entrance exams question answering task at CLEF. In these experiments, we use syntactic n-grams as features and Levenshtein distance as the similarity between n-grams, measured either in characters or in elements of n-grams.

297 citations

Proceedings ArticleDOI
26 Nov 2001
TL;DR: The intention of the approach is to enhance and augment existing clone detection methods that are based on structural analysis and improve the quality of clone detection.
Abstract: Source code duplication occurs frequently within large software systems. Pieces of source code, functions, and data types are often duplicated in part or in whole, for a variety of reasons. Programmers may simply be reusing a piece of code via copy and paste or they may be "re-inventing the wheel". Previous research on the detection of clones is mainly focused on identifying pieces of code with similar (or nearly similar) structure. Our approach is to examine the source code text (comments and identifiers) and identify implementations of similar high-level concepts (e.g., abstract data types). The approach uses an information retrieval technique (i.e., latent semantic indexing) to statically analyze the software system and determine semantic similarities between source code documents (i.e., functions, files, or code segments). These similarity measures are used to drive the clone detection process. The intention of our approach is to enhance and augment existing clone detection methods that are based on structural analysis. This synergistic use of methods will improve the quality of clone detection. A set of experiments is presented that demonstrate the usage of semantic similarity measure to identify clones within a version of NCSA Mosaic.

295 citations

Journal ArticleDOI
01 Jan 2003-Language
TL;DR: The authors compare the semantics of spatial adpositions in nine unrelated languages, with the help of a standard elicitation procedure, thus producing a preliminary semantic typology of space adpositional systems.
Abstract: Most approaches to spatial language have assumed that the simplest spatial notions are (after Piaget) topological and universal (containment, contiguity, proximity, support, represented as semantic primitives suchas IN, ON, UNDER, etc.). These concepts would be coded directly in language, above all in small closed classes suchas adpositions-thus providing a striking example of semantic categories as language-specific projections of universal conceptual notions. This idea, if correct, should have as a consequence that the semantic categories instantiated in spatial adpositions should be essentially uniform crosslinguistically. This article attempts to verify this possibility by comparing the semantics of spatial adpositions in nine unrelated languages, with the help of a standard elicitation procedure, thus producing a preliminary semantic typology of spatial adpositional systems. The differences between the languages turn out to be so significant as to be incompatible withstronger versions of the UNIVERSAL CONCEPTUAL CATEGORIES hypothesis. Rather, the language-specific spatial adposition meanings seem to emerge as compact subsets of an underlying semantic space, withcertain areas being statistical ATTRACTORS or FOCI. Moreover, a comparison of systems withdifferent degrees of complexity suggests the possibility of positing implicational hierarchies for spatial adpositions. But such hierarchies need to be treated as successive divisions of semantic space, as in recent treatments of basic color terms. This type of analysis appears to be a promising approachfor future work in semantic typology.*

295 citations


Network Information
Related Topics (5)
Web page
50.3K papers, 975.1K citations
84% related
Graph (abstract data type)
69.9K papers, 1.2M citations
84% related
Unsupervised learning
22.7K papers, 1M citations
83% related
Feature vector
48.8K papers, 954.4K citations
83% related
Web service
57.6K papers, 989K citations
82% related
Performance
Metrics
No. of papers in the topic in previous years
YearPapers
2023202
2022522
2021641
2020837
2019866
2018787