Topic

Semantic similarity

About: Semantic similarity is a research topic. Over the lifetime, 14605 publications have been published within this topic receiving 364659 citations. The topic is also known as: semantic relatedness.

...read moreread less

Papers published on a yearly basis

1 / 2

Papers

PDF

Open Access

More filters

Journal Article•DOI•

AquaLog: An ontology-driven question answering system for organizational semantic intranets

[...]

Vanessa Lopez¹, Victoria Uren¹, Enrico Motta¹, Michele Pasin¹•Institutions (1)

Open University¹

01 Jun 2007-Journal of Web Semantics

TL;DR: AquaLog is a portable question-answering system which takes queries expressed in natural language and an ontology as input, and returns answers drawn from one or more knowledge bases (KBs) because the configuration time required to customize the system for a particular ontology is negligible.

...read moreread less

224 citations

Proceedings Article•DOI•

KORE: keyphrase overlap relatedness for entity disambiguation

[...]

Johannes Hoffart¹, Stephan Seufert¹, Dat Ba Nguyen¹, Martin Theobald¹, Gerhard Weikum¹ - Show less +1 more•Institutions (1)

Max Planck Society¹

29 Oct 2012

TL;DR: A novel notion of semantic relatedness between two entities represented as sets of weighted (multi-word) keyphrases, with consideration of partially overlapping phrases is developed, which improves the quality of prior link-based models, and also eliminates the need for explicit interlinkage between entities.

...read moreread less

Abstract: Measuring the semantic relatedness between two entities is the basis for numerous tasks in IR, NLP, and Web-based knowledge extraction. This paper focuses on disambiguating names in a Web or text document by jointly mapping all names onto semantically related entities registered in a knowledge base. To this end, we have developed a novel notion of semantic relatedness between two entities represented as sets of weighted (multi-word) keyphrases, with consideration of partially overlapping phrases. This measure improves the quality of prior link-based models, and also eliminates the need for (usually Wikipedia-centric) explicit interlinkage between entities. Thus, our method is more versatile and can cope with long-tail and newly emerging entities that have few or no links associated with them. For efficiency, we have developed approximation techniques based on min-hash sketches and locality-sensitive hashing. Our experiments on semantic relatedness and on named entity disambiguation demonstrate the superiority of our method compared to state-of-the-art baselines.

...read moreread less

224 citations

Proceedings Article•DOI•

Algorithmic detection of semantic similarity

[...]

Ana Gabriela Maguitman¹, Filippo Menczer¹, Heather Roinestad¹, Alessandro Vespignani¹•Institutions (1)

Indiana University¹

10 May 2005

TL;DR: An information-theoretic measure of semantic similarity that exploits both the hierarchical and non-hierarchical structure of an ontology is defined, and an experimental study shows that this measure improves significantly on the traditional taxonomy-based approach.

...read moreread less

Abstract: Automatic extraction of semantic information from text and links in Web pages is key to improving the quality of search results. However, the assessment of automatic semantic measures is limited by the coverage of user studies, which do not scale with the size, heterogeneity, and growth of the Web. Here we propose to leverage human-generated metadata --- namely topical directories --- to measure semantic relationships among massive numbers of pairs of Web pages or topics. The Open Directory Project classifies millions of URLs in a topical ontology, providing a rich source from which semantic relationships between Web pages can be derived. While semantic similarity measures based on taxonomies (trees) are well studied, the design of well-founded similarity measures for objects stored in the nodes of arbitrary ontologies (graphs) is an open problem. This paper defines an information-theoretic measure of semantic similarity that exploits both the hierarchical and non-hierarchical structure of an ontology. An experimental study shows that this measure improves significantly on the traditional taxonomy-based approach. This novel measure allows us to address the general question of how text and link analyses can be combined to derive measures of relevance that are in good agreement with semantic similarity. Surprisingly, the traditional use of text similarity turns out to be ineffective for relevance ranking.

...read moreread less

223 citations

Journal Article•DOI•

Early lexical development in a self-organizing neural network

[...]

Ping Li¹, Igor Farkaš², Brian MacWhinney³•Institutions (3)

University of Richmond¹, Comenius University in Bratislava², Carnegie Mellon University³

01 Oct 2004-Neural Networks

TL;DR: In simulations, DevLex develops topographically organized representations for linguistic categories over time, models lexical confusion as a function of word density and semantic similarity, and shows age-of-acquisition effects in the course of learning a growing lexicon.

...read moreread less

223 citations

Book Chapter•DOI•

A probabilistic approach to semantic representation

[...]

Thomas L. Griffiths¹, Mark Steyvers¹•Institutions (1)

Stanford University¹

01 Jan 2002

TL;DR: This paper illustrates that the large-scale structure of this representation has statistical properties that corre- spond well with those of semantic networks produced by humans, and trace this to the fidelity with which it reproduces the natural statistics of language.

...read moreread less

Abstract: A probabilistic approach to semantic representation Thomas L. Griﬃths & Mark Steyvers {gruffydd,msteyver}@psych.stanford.edu Department of Psychology Stanford University Stanford, CA 94305-2130 USA Abstract Semantic networks produced from human data have statistical properties that cannot be easily captured by spatial representations. We explore a probabilis- tic approach to semantic representation that explic- itly models the probability with which words occur in diﬀerent contexts, and hence captures the proba- bilistic relationships between words. We show that this representation has statistical properties consis- tent with the large-scale structure of semantic net- works constructed by humans, and trace the origins of these properties. Contemporary accounts of semantic representa- tion suggest that we should consider words to be either points in a high-dimensional space (eg. Lan- dauer & Dumais, 1997), or interconnected nodes in a semantic network (eg. Collins & Loftus, 1975). Both of these ways of representing semantic information provide important insights, but also have shortcom- ings. Spatial approaches illustrate the importance of dimensionality reduction and employ simple al- gorithms, but are limited by Euclidean geometry. Semantic networks are less constrained, but their graphical structure lacks a clear interpretation. In this paper, we view the function of associa- tive semantic memory to be eﬃcient prediction of the concepts likely to occur in a given context. We take a probabilistic approach to this problem, mod- eling documents as expressing information related to a small number of topics (cf. Blei, Ng, & Jordan, 2002). The topics of a language can then be learned from the words that occur in diﬀerent documents. We illustrate that the large-scale structure of this representation has statistical properties that corre- spond well with those of semantic networks produced by humans, and trace this to the fidelity with which it reproduces the natural statistics of language. Approaches to semantic representation Spatial approaches Latent Semantic Analysis (LSA; Landauer & Dumais, 1997) is a procedure for finding a high-dimensional spatial representation for words. LSA uses singular value decomposition to factorize a word-document co-occurrence matrix. An approximation to the original matrix can be ob- tained by choosing to use less singular values than its rank. One component of this approximation is a matrix that gives each word a location in a high di- mensional space. Distances in this space are predic- tive in many tasks that require the use of semantic information. Performance is best for approximations that used less singular values than the rank of the matrix, illustrating that reducing the dimensional- ity of the representation can reduce the eﬀects of statistical noise and increase eﬃciency. While the methods behind LSA were novel in scale and subject, the suggestion that similarity relates to distance in psychological space has a long history (Shepard, 1957). Critics have argued that human similarity judgments do not satisfy the properties of Euclidean distances, such as symmetry or the tri- angle inequality. Tversky and Hutchinson (1986) pointed out that Euclidean geometry places strong constraints on the number of points to which a par- ticular point can be the nearest neighbor, and that many sets of stimuli violate these constraints. The number of nearest neighbors in similarity judgments has an analogue in semantic representation. Nelson, McEvoy and Schreiber (1999) had people perform a word association task in which they named an as- sociated word in response to a set of target words. Steyvers and Tenenbaum (submitted) noted that the number of unique words produced for each target fol- lows a power law distribution: if k is the number of words, P (k) ∝ k γ . For reasons similar to those of Tversky and Hutchinson, it is diﬃcult to produce a power law distribution by thresholding cosine or dis- tance in Euclidean space. This is shown in Figure 1. Power law distributions appear linear in log-log co- ordinates. LSA produces curved log-log plots, more consistent with an exponential distribution. Semantic networks Semantic networks were pro- posed by Collins and Quillian (1969) as a means of storing semantic knowledge. The original net- works were inheritance hierarchies, but Collins and Loftus (1975) generalized the notion to cover arbi- trary graphical structures. The interpretation of this graphical structure is vague, being based on connect- ing nodes that “activate” one another. Steyvers and Tenenbaum (submitted) constructed a semantic net- work from the word association norms of Nelson et

...read moreread less

222 citations

Collapse

Network Information

Performance

Metrics

15,319

Papers

407,958

Citations

No. of papers in the topic in previous years
Year	Papers
2023	202
2022	522
2021	641
2020	837
2019	866
2018	787

Semantic similarity

Papers published on a yearly basis

Papers

Trending Questions (10)

Network Information

Related Topics (5)

Performance

Metrics