scispace - formally typeset
Search or ask a question
Topic

Semantic similarity

About: Semantic similarity is a research topic. Over the lifetime, 14605 publications have been published within this topic receiving 364659 citations. The topic is also known as: semantic relatedness.


Papers
More filters
Journal ArticleDOI
TL;DR: In this paper, the effects of semantic priming on picture and word processing were assessed under conditions in which subjects were required simply to identify stimuli (label pictures or read words) as rapidly as possible.
Abstract: The effects of semantic priming on picture and word processing were assessed under conditions in which subjects were required simply to identify stimuli (label pictures or read words) as rapidly as possible. Stimuli were presented in pairs (a prime followed by a target), with half of the pairs containing members of the same semantic category and half containing unrelated concepts. Semantic relatedness was found to facilitate the identification of both pictures (Experiment 1) and words (Experiment 2), and obtained interactions of semantic relatedness and stimulus quality in both experiments suggested that semantic priming affects the initial encoding of both types of stimuli. In Experiment 3, subjects received pairs of pictures, pairs of words, and mixed pairs composed of a picture and a word or of a word and a picture. Significant priming effects were obtained on mixed as well as unmixed pairs, supporting the assumption that pictures and words access semantic information from a common semantic store. Of primary interest was the significantly greater priming obtained in picture-picture pairs than in word-word or mixed pairs. This suggests that, in addition to priming that is mediated by the semantic system, priming may occur in picture-picture pairs that results from the overlap in visual features common to the pictorial representations of objects from the same semantic category.

227 citations

01 Jan 2005
TL;DR: An attempt to establish a ‘psychological ground truth’ for evaluating models of the ability of word-based, n-gram and Latent Semantic Analysis approaches to model human judgments of text document similarity is reported.
Abstract: An Empirical Evaluation of Models of Text Document Similarity Michael D. Lee (michael.lee@adelaide.edu.au) Department of Psychology, University of Adelaide South Australia, 5005, AUSTRALIA Brandon Pincombe (brandon.pincombe@dsto.defence.gov.au) Intelligence Surveillance and Reconnaissance Division, Defence Science and Technology Organisation PO Box 1500, Edinburgh SA 5111 AUSTRALIA Matthew Welsh (matthew.welsh@adelaide.edu.au) Australian School of Petroleum Engineering, University of Adelaide South Australia, 5005, AUSTRALIA Abstract Modeling the semantic similarity between text docu- ments presents a significant theoretical challenge for cognitive science, with ready-made applications in in- formation handling and decision support systems deal- ing with text. While a number of candidate models exist, they have generally not been assessed in terms of their ability to emulate human judgments of simi- larity. To address this problem, we conducted an ex- periment that collected repeated similarity measures for each pair of documents in a small corpus of short news documents. An analysis of human performance showed inter-rater correlations of about 0.6. We then considered the ability of existing models—using word- based, n-gram and Latent Semantic Analysis (LSA) approaches—to model these human judgments. The best performed LSA model produced correlations of about 0.6, consistent with human performance, while the best performed word-based and n-gram models achieved correlations closer to 0.5. Many of the re- maining models showed almost no correlation with hu- man performance. Based on our results, we provide some discussion of the key strengths and weaknesses of the models we examined. Introduction Modeling the semantic similarity between text docu- ments is an interesting problem for cognitive science, for both theoretical and practical reasons. Theoret- ically, it involves the study of a basic cognitive pro- cess with richly structured natural stimuli. Practically, search engines, text corpus visualizations, and a vari- ety of other applications for filtering, sorting, retriev- ing, and generally handling text rely fundamentally on similarity measures. For this reason, the ability to as- sess semantic similarity in an accurate, automated, and scalable way is a key determinant of the effectiveness of most information handling and decision support soft- ware that deals with text. A variety of different approaches have been devel- oped for modeling text document similarity. These in- clude simple word-based, keyword-based and n-gram measures (e.g., Salton, 1989; Damashek, 1995), and more complicated approaches such as Latent Seman- tic Analysis (LSA: Deerwester et al., 1990; Landauer and Dumais, 1997). While all of these approaches have achieved some level of practical success, they have gen- erally not been assessed in terms of their ability to model human judgments of text document similarity. The most likely reason for this failure is that no suit- able empirical data exist, and considerable effort is in- volved in collecting pairwise ratings of text document similarity for even a moderate number of documents. This paper reports the collection of data that give ten independent ratings of the similarity of every pair of 50 short text documents, and so represents an attempt to establish a ‘psychological ground truth’ for evaluating models. Using the new data, we report a first eval- uation of the ability of word-based, n-gram and LSA approaches to model human judgments. Experiment Materials The text corpus evaluated by human judges contained 50 documents selected from the Australian Broadcast- ing Corporation’s news mail service, which provides text e-mails of headline stories. The documents varied in length from 51 to 126 words, and covered a number of broad topics. A further 314 documents from the same were collected to act as a larger ‘backgrounding’ corpus for LSA. Both document sets were assessed against a stan- dard corpus of five English texts using four models of language. These were the log-normal, generalized in- verse Gauss-Poisson (with γ = −0.5), Yule-Simon and Zipfian models (Baayen, 2001). Both document sets were within the normal range of English text for word frequency spectrum and vocabulary growth and were therefore regarded as representative of normal English texts. Subjects The subjects were 83 University of Adelaide students (29 males and 54 females), with a mean age of 19.7 years. They were each paid with a ten (Australian) dollar gift voucher for every 100 document pair ratings made.

227 citations

Proceedings Article
07 Jun 2012
TL;DR: This work uses a simple log-linear regression model, trained on the training data, to combine multiple text similarity measures of varying complexity, which range from simple character and word n-grams and common subsequences to complex features such as Explicit Semantic Analysis vector comparisons and aggregation of word similarity based on lexical-semantic resources.
Abstract: We present the UKP system which performed best in the Semantic Textual Similarity (STS) task at SemEval-2012 in two out of three metrics. It uses a simple log-linear regression model, trained on the training data, to combine multiple text similarity measures of varying complexity. These range from simple character and word n-grams and common subsequences to complex features such as Explicit Semantic Analysis vector comparisons and aggregation of word similarity based on lexical-semantic resources. Further, we employ a lexical substitution system and statistical machine translation to add additional lexemes, which alleviates lexical gaps. Our final models, one per dataset, consist of a log-linear combination of about 20 features, out of the possible 300+ features implemented.

226 citations

Journal ArticleDOI
TL;DR: This article focuses on the similarity measuring methods for CBR and reviews the existing methods for measuring similarity in the literature based on more than 100 CBR project studies and some general similarity measures seen in other applications.
Abstract: Case - based reasoning ( CBR ) is one of the emerging paradigms for designing intelligent systems . Retrieval of similar cases is a primary step in CBR , and the similarity measure plays a very important role in case retrieval . Sometimes CBR systems are called similarity searching systems , the most important characteristic of which is the effectiveness of the similarity measure used to quantify the degree of resemblance between a pair of cases . This article focuses on the similarity measuring methods for CBR and comprises two parts . The first part reviews the existing methods for measuring similarity in the literature based on more than 100 CBR project studies and some general similarity measures seen in other applications . In the second part , a hybrid similarity measure is proposed for comparing cases with a mixture of crisp and fuzzy features . Its application to the domain of failure analysis is illustrated .

226 citations

Journal ArticleDOI
TL;DR: In this article, the effects of temporal and semantic proximity on output order in delayed and continuous-distractor free recall of random word lists were investigated using Latent Semantic Analysis (LSA).

226 citations


Network Information
Related Topics (5)
Web page
50.3K papers, 975.1K citations
84% related
Graph (abstract data type)
69.9K papers, 1.2M citations
84% related
Unsupervised learning
22.7K papers, 1M citations
83% related
Feature vector
48.8K papers, 954.4K citations
83% related
Web service
57.6K papers, 989K citations
82% related
Performance
Metrics
No. of papers in the topic in previous years
YearPapers
2023202
2022522
2021641
2020837
2019866
2018787