Topic

Semantic similarity

About: Semantic similarity is a research topic. Over the lifetime, 14605 publications have been published within this topic receiving 364659 citations. The topic is also known as: semantic relatedness.

...read moreread less

Papers published on a yearly basis

1 / 2

Papers

PDF

Open Access

More filters

Journal Article•DOI•

OWLS-MX: A hybrid Semantic Web service matchmaker for OWL-S services

[...]

Matthias Klusch¹, Benedikt Fries, Katia Sycara²•Institutions (2)

German Research Centre for Artificial Intelligence¹, Carnegie Mellon University²

01 Apr 2009-Journal of Web Semantics

TL;DR: The OWLS-MX as discussed by the authors is a hybrid Semantic Web service matchmaker for OWL-S services, which complements logic-based semantic matching with token-based syntactic similarity measurements in case the former fails.

...read moreread less

302 citations

Proceedings Article•

Learning Discriminative Projections for Text Similarity Measures

[...]

Wen-tau Yih¹, Kristina Toutanova¹, John Platt¹, Christopher Meek¹•Institutions (1)

Microsoft¹

23 Jun 2011

TL;DR: A novel discriminative training method that projects the raw term vectors into a common, low-dimensional vector space, which not only outperforms existing state-of-the-art approaches, but also achieves high accuracy at low dimensions and is thus more efficient.

...read moreread less

Abstract: Traditional text similarity measures consider each term similar only to itself and do not model semantic relatedness of terms. We propose a novel discriminative training method that projects the raw term vectors into a common, low-dimensional vector space. Our approach operates by finding the optimal matrix to minimize the loss of the pre-selected similarity function (e.g., cosine) of the projected vectors, and is able to efficiently handle a large number of training examples in the high-dimensional space. Evaluated on two very different tasks, cross-lingual document retrieval and ad relevance measure, our method not only outperforms existing state-of-the-art approaches, but also achieves high accuracy at low dimensions and is thus more efficient.

...read moreread less

298 citations

Journal Article•DOI•

[...]

Grigori Sidorov, Alexander Gelbukh, Helena Gómez-Adorno¹, David Pinto²•Institutions (2)

Instituto Politécnico Nacional¹, Benemérita Universidad Autónoma de Puebla²

29 Sep 2014-Computación Y Sistemas

TL;DR: The proposed similarity measure soft similarity is a generalize of the well-known cosine similarity measure in VSM by introducing what it is called “soft cosine measure” and various formulas for exact or approximate calculation of the softcosine measure are proposed.

...read moreread less

Abstract: We show how to consider similarity between features for calculation of similarity of objects in the Vector Space Model (VSM) for machine learning algorithms and other classes of methods that involve similarity between objects. Unlike LSA, we assume that similarity between features is known (say, from a synonym dictionary) and does not need to be learned from the data.We call the proposed similarity measure soft similarity. Similarity between features is common, for example, in natural language processing: words, n-grams, or syntactic n-grams can be somewhat different (which makes them different features) but still have much in common: for example, words “play” and “game” are different but related. When there is no similarity between features then our soft similarity measure is equal to the standard similarity. For this, we generalize the well-known cosine similarity measure in VSM by introducing what we call “soft cosine measure”. We propose various formulas for exact or approximate calculation of the soft cosine measure. For example, in one of them we consider for VSM a new feature space consisting of pairs of the original features weighted by their similarity. Again, for features that bear no similarity to each other, our formulas reduce to the standard cosine measure. Our experiments show that our soft cosine measure provides better performance in our case study: entrance exams question answering task at CLEF. In these experiments, we use syntactic n-grams as features and Levenshtein distance as the similarity between n-grams, measured either in characters or in elements of n-grams.

...read moreread less

297 citations

Proceedings Article•DOI•

Identification of high-level concept clones in source code

[...]

Andrian Marcus¹, Jonathan I. Maletic¹•Institutions (1)

Kent State University¹

26 Nov 2001

TL;DR: The intention of the approach is to enhance and augment existing clone detection methods that are based on structural analysis and improve the quality of clone detection.

...read moreread less

Abstract: Source code duplication occurs frequently within large software systems. Pieces of source code, functions, and data types are often duplicated in part or in whole, for a variety of reasons. Programmers may simply be reusing a piece of code via copy and paste or they may be "re-inventing the wheel". Previous research on the detection of clones is mainly focused on identifying pieces of code with similar (or nearly similar) structure. Our approach is to examine the source code text (comments and identifiers) and identify implementations of similar high-level concepts (e.g., abstract data types). The approach uses an information retrieval technique (i.e., latent semantic indexing) to statically analyze the software system and determine semantic similarities between source code documents (i.e., functions, files, or code segments). These similarity measures are used to drive the clone detection process. The intention of our approach is to enhance and augment existing clone detection methods that are based on structural analysis. This synergistic use of methods will improve the quality of clone detection. A set of experiments is presented that demonstrate the usage of semantic similarity measure to identify clones within a version of NCSA Mosaic.

...read moreread less

295 citations

Journal Article•DOI•

'Natural concepts' in the spatial topological domain - adpositional meanings in crosslinguistic perspective: An exercise in semantic typology

[...]

Stephen C. Levinson¹, Sérgio Meira¹•Institutions (1)

Max Planck Society¹

01 Jan 2003-Language

TL;DR: The authors compare the semantics of spatial adpositions in nine unrelated languages, with the help of a standard elicitation procedure, thus producing a preliminary semantic typology of space adpositional systems.

...read moreread less

Abstract: Most approaches to spatial language have assumed that the simplest spatial notions are (after Piaget) topological and universal (containment, contiguity, proximity, support, represented as semantic primitives suchas IN, ON, UNDER, etc.). These concepts would be coded directly in language, above all in small closed classes suchas adpositions-thus providing a striking example of semantic categories as language-specific projections of universal conceptual notions. This idea, if correct, should have as a consequence that the semantic categories instantiated in spatial adpositions should be essentially uniform crosslinguistically. This article attempts to verify this possibility by comparing the semantics of spatial adpositions in nine unrelated languages, with the help of a standard elicitation procedure, thus producing a preliminary semantic typology of spatial adpositional systems. The differences between the languages turn out to be so significant as to be incompatible withstronger versions of the UNIVERSAL CONCEPTUAL CATEGORIES hypothesis. Rather, the language-specific spatial adposition meanings seem to emerge as compact subsets of an underlying semantic space, withcertain areas being statistical ATTRACTORS or FOCI. Moreover, a comparison of systems withdifferent degrees of complexity suggests the possibility of positing implicational hierarchies for spatial adpositions. But such hierarchies need to be treated as successive divisions of semantic space, as in recent treatments of basic color terms. This type of analysis appears to be a promising approachfor future work in semantic typology.*

...read moreread less

295 citations

Collapse

Network Information

Performance

Metrics

15,319

Papers

407,958

Citations

No. of papers in the topic in previous years
Year	Papers
2023	202
2022	522
2021	641
2020	837
2019	866
2018	787

Semantic similarity

Papers published on a yearly basis

Papers

Trending Questions (10)

Network Information

Related Topics (5)

Performance

Metrics