scispace - formally typeset
Search or ask a question
Topic

Semantic similarity

About: Semantic similarity is a research topic. Over the lifetime, 14605 publications have been published within this topic receiving 364659 citations. The topic is also known as: semantic relatedness.


Papers
More filters
Journal ArticleDOI
TL;DR: This article examined the influence of cross-language priming on lexical decisions in Spanish-English bilinguals at a 300-ms and 100-ms stimulus onset asyncrony, respectively.

147 citations

Journal ArticleDOI
TL;DR: It is suggested that the use of a high dimensional dynamic viewer with an effective projection pursuit routine and user control, coupled with the exquisite abilities of the human visual system, can often succeed in discovering multiple revealing views that are missed by current computational algorithms.
Abstract: Most techniques for relating textual information rely on intellectually created links such as author-chosen keywords and titles, authority indexing terms, or bibliographic citations. Similarity of the semantic content of whole documents, rather than just titles, abstracts, or overlap of keywords, offers an attractive alternative. Latent semantic analysis provides an effective dimension reduction method for the purpose that reflects synonymy and the sense of arbitrary word combinations. However, latent semantic analysis correlations with human text-to-text similarity judgments are often empirically highest at ≈300 dimensions. Thus, two- or three-dimensional visualizations are severely limited in what they can show, and the first and/or second automatically discovered principal component, or any three such for that matter, rarely capture all of the relations that might be of interest. It is our conjecture that linguistic meaning is intrinsically and irreducibly very high dimensional. Thus, some method to explore a high dimensional similarity space is needed. But the 2.7 × 107 projections and infinite rotations of, for example, a 300-dimensional pattern are impossible to examine. We suggest, however, that the use of a high dimensional dynamic viewer with an effective projection pursuit routine and user control, coupled with the exquisite abilities of the human visual system to extract information about objects and from moving patterns, can often succeed in discovering multiple revealing views that are missed by current computational algorithms. We show some examples of the use of latent semantic analysis to support such visualizations and offer views on future needs.

146 citations

01 Jan 2001
TL;DR: A theoretical framework for semantic space models is developed by synthesizing theoretical analyses from vector space information re- trieval and categorical data analysis with new basic re- search.
Abstract: Towards a Theory of Semantic Space Will Lowe (wlowe02@tufts.edu) Center for Cognitive Studies Tufts University; MA 21015 USA Abstract This paper adds some theory to the growing literature of semantic space models. We motivate semantic space models from the perspective of distributional linguistics and show how an explicit mathematical formulation can provide a better understanding of existing models and suggest changes and improvements. In addition to pro- viding a theoretical framework for current models, we consider the implications of statistical aspects of language data that have not been addressed in the psychological modeling literature. Statistical approaches to language must deal principally with count data, and this data will typically have a highly skewed frequency distribution due to Zipf’s law. We consider the consequences of these facts for the construction of semantic space models, and present methods for removing frequency biases from se- mantic space models. Introduction There is a growing literature on the empirical adequacy of semantic space models across a wide range of sub- ject domains (Burgess et al., 1998; Landauer et al., 1998; Foltz et al., 1998; McDonald and Lowe, 1998; Lowe and McDonald, 2000). However, semantic space mod- els are typically structured and parameterized differently by each researcher. Levy and Bullinaria (2000) have ex- plored the implications of parameter changes empirically by running multiple simulations, but there has up until now been no work that places semantic space models in an overarching theoretical framework; consequently there there are few statements of how semantic spaces ought to be structured in the light of their intended pur- pose. In this paper we attempt to develop a theoretical framework for semantic space models by synthesizing theoretical analyses from vector space information re- trieval and categorical data analysis with new basic re- search. The structure of the paper is as follows. The next sec- tion brie¤y motivates semantic space models using ideas from distributional linguistics. We then review Zipf’s law and its consequences the distributional character of linguistic data. The £nal section presents a formal de£- nition of semantic space models and considers what ef- fects different choices of component have on the result- ing models. Motivating Semantic Space Firth (1968) observed that “you shall know a word by the company it keeps”. If we interpret company as lex- ical company, the words that occur near to it in text or speech, then two related claims are possible. The £rst is unexceptional: we come to know about the syntactic character of a word by examining the other words that may and may not occur around it in text. Syntactic theory then postulates latent variables e.g. parts of speech and branching structure, that control the distributional prop- erties of words and restrictions on their contexts of occur- rence. The second claim is that we come to know about the semantic character of a word by examining the other words that may and may not occur around it in text. The intuition for this distributional characterization of semantics is that whatever makes words similar or dis- similar in meaning, it must show up distributionally, in the lexical company of the word. Otherwise the suppos- edly semantic difference is not available to hearers and it is not easy to see how it may be learned. If words are similar to the extent that they occur in the similar contexts then we may de£ne a statistical re- placement test (Finch, 1993) which tests the meaning- fulness of the result of switching one word for another in a sentence. When a corpus of meaningful sentences is available the test may be reversed (Lowe, 2000a), and un- der a suitable representation of lexical context, we may hold each word constant and estimate its typical sur- rounding context. A semantic space model is a way of representing similarity of typical context in a Euclidean space with axes determined by local word co-occurrence counts. Counting the co-occurrence of a target word with a £xed set of D other words makes it possible to position the target in a space of dimension D. A target’s position with respect to other words then expresses similarity of lexical context. Since the basic notion from distributional linguistics is ‘intersubstitutability in context’, a semantic space model is effective to the extent it realizes this idea accurately. Zipf’s Law The frequency of a word is (approximately) proportional to the reciprocal of its rank in a frequency list (Zipf, 1949; Mandelbrot, 1954). This is Zipf’s Law. Zipf’s law ensures dramatically skewed distributions for almost

146 citations

Proceedings ArticleDOI
07 Oct 2001
TL;DR: The method uses multidimensional scaling and hierarchical cluster analysis to model the semantic categories into which human observers organize images, and devise an image similarity metric that embodies the results, and develop a prototype system.
Abstract: We propose a method for semantic categorization and retrieval of photographic images based on low-level image descriptors. In this method, we first use multidimensional scaling (MDS) and hierarchical cluster analysis (HCA) to model the semantic categories into which human observers organize images. Through a series of psychophysical experiments and analyses, we refine our definition of these semantic categories, and use these results to discover a set of low-level image features to describe each category. We then devise an image similarity metric that embodies our results, and develop a prototype system, which identifies the semantic category of the image and retrieves the most similar images from the database. We tested the metric on a new set of images, and compared the categorization results with that of human observers. Our results provide a good match to human performance, thus validating the use of human judgments to develop semantic descriptors.

146 citations

Journal ArticleDOI
TL;DR: This paper proposes a novel strategy to exploit the semantic similarity of the training data and design an efficient generative adversarial framework to learn binary hash codes in an unsupervised manner, and achieves comparable performance with popular supervised hashing methods.
Abstract: Hashing plays a pivotal role in nearest-neighbor searching for large-scale image retrieval. Recently, deep learning-based hashing methods have achieved promising performance. However, most of these deep methods involve discriminative models, which require large-scale, labeled training datasets, thus hindering their real-world applications. In this paper, we propose a novel strategy to exploit the semantic similarity of the training data and design an efficient generative adversarial framework to learn binary hash codes in an unsupervised manner. Specifically, our model consists of three different neural networks: an encoder network to learn hash codes from images, a generative network to generate images from hash codes, and a discriminative network to distinguish between pairs of hash codes and images. By adversarially training these networks, we successfully learn mutually coherent encoder and generative networks, and can output efficient hash codes from the encoder network. We also propose a novel strategy, which utilizes both feature and neighbor similarities, to construct a semantic similarity matrix, then use this matrix to guide the hash code learning process. Integrating the supervision of this semantic similarity matrix into the adversarial learning framework can efficiently preserve the semantic information of training data in Hamming space. The experimental results on three widely used benchmarks show that our method not only significantly outperforms several state-of-the-art unsupervised hashing methods, but also achieves comparable performance with popular supervised hashing methods.

146 citations


Network Information
Related Topics (5)
Web page
50.3K papers, 975.1K citations
84% related
Graph (abstract data type)
69.9K papers, 1.2M citations
84% related
Unsupervised learning
22.7K papers, 1M citations
83% related
Feature vector
48.8K papers, 954.4K citations
83% related
Web service
57.6K papers, 989K citations
82% related
Performance
Metrics
No. of papers in the topic in previous years
YearPapers
2023202
2022522
2021641
2020837
2019866
2018787