scispace - formally typeset
Search or ask a question
Topic

Semantic similarity

About: Semantic similarity is a research topic. Over the lifetime, 14605 publications have been published within this topic receiving 364659 citations. The topic is also known as: semantic relatedness.


Papers
More filters
Proceedings ArticleDOI
01 Jun 2016
TL;DR: This paper proposed a model to learn visually grounded word embeddings (vis-w2v) to capture visual notions of semantic relatedness in the visual world, and found that visual grounding of words depends on semantics, and not the literal pixels.
Abstract: We propose a model to learn visually grounded word embeddings (vis-w2v) to capture visual notions of semantic relatedness. While word embeddings trained using text have been extremely successful, they cannot uncover notions of semantic relatedness implicit in our visual world. For instance, although "eats" and "stares at" seem unrelated in text, they share semantics visually. When people are eating something, they also tend to stare at the food. Grounding diverse relations like "eats" and "stares at" into vision remains challenging, despite recent progress in vision. We note that the visual grounding of words depends on semantics, and not the literal pixels. We thus use abstract scenes created from clipart to provide the visual grounding. We find that the embeddings we learn capture fine-grained, visually grounded notions of semantic relatedness. We show improvements over text-only word embeddings (word2vec) on three tasks: common-sense assertion classification, visual paraphrasing and text-based image retrieval. Our code and datasets are available online.

90 citations

Proceedings ArticleDOI
06 Aug 2009
TL;DR: A novel statistical language model is proposed to capture long-range semantic dependencies by applying the concept of semantic composition to the problem of constructing predictive history representations for upcoming words.
Abstract: In this paper we propose a novel statistical language model to capture long-range semantic dependencies Specifically, we apply the concept of semantic composition to the problem of constructing predictive history representations for upcoming words We also examine the influence of the underlying semantic space on the composition task by comparing spatial semantic representations against topic-based ones The composition models yield reductions in perplexity when combined with a standard n-gram language model over the n-gram model alone We also obtain perplexity reductions when integrating our models with a structured language model

90 citations

Proceedings ArticleDOI
01 Jul 2017
TL;DR: This paper proposes a simple, yet effective generalized hashing framework which can work for all the different scenarios, while preserving the semantic distance between the data points, and learns the optimum hash codes for the two modalities simultaneously.
Abstract: Due to availability of large amounts of multimedia data, cross-modal matching is gaining increasing importance. Hashing based techniques provide an attractive solution to this problem when the data size is large. Different scenarios of cross-modal matching are possible, for example, data from the different modalities can be associated with a single label or multiple labels, and in addition may or may not have one-to-one correspondence. Most of the existing approaches have been developed for the case where there is one-to-one correspondence between the data of the two modalities. In this paper, we propose a simple, yet effective generalized hashing framework which can work for all the different scenarios, while preserving the semantic distance between the data points. The approach first learns the optimum hash codes for the two modalities simultaneously, so as to preserve the semantic similarity between the data points, and then learns the hash functions to map from the features to the hash codes. Extensive experiments on single label dataset like Wiki and multi-label datasets like NUS-WIDE, Pascal and LabelMe under all the different scenarios and comparisons with the state-of-the-art shows the effectiveness of the proposed approach.

90 citations

Proceedings Article
01 Oct 1980
TL;DR: A number of postulates that need to be satisfied by a database defined in forms of such features for its definition to be well-formed are presented.
Abstract: This paper's principal goal is to provide a discussion on issues raised by the coexistence in a semantic data model of (i) An object-oriented framework including the notions of token, class and property as well as the IS-A and INSTANCE-OF relations; (ii) Transactions that can cause state changes; (iii) Special (null) values such as "unknown", "nothing" and "inconsistent". The paper presents a number of postulates that need to be satisfied by a database defined in forms of such features for its definition to be well-formed. The discussion uses as starting point TAXIS, a language for the design of interactive applications systems which offers all three types of features.

90 citations

Journal ArticleDOI
TL;DR: Semantic similarity measures based on the UMLS, which contains SNOMED CT and MeSH, significantly outperformed those based solely on SNOMed CT or MeSH across evaluations, suggesting that knowledge based semantic similarity measures are more practical to compute than distributional measures, as they do not require an external corpus.
Abstract: Background: Semantic similarity measures estimate the similarity between concepts, and play an important role in many text processing tasks. Approaches to semantic similarity in the biomedical domain can be roughly divided into knowledge based and distributional based methods. Knowledge based approaches utilize knowledge sources such as dictionaries, taxonomies, and semantic networks, and include path finding measures and intrinsic information content (IC) measures. Distributional measures utilize, in addition to a knowledge source, the distribution of concepts within a corpus to compute similarity; these include corpus IC and context vector methods. Prior evaluations of these measures in the biomedical domain showed that distributional measures outperform knowledge based path finding methods; but more recent studies suggested that intrinsic IC based measures exceed the accuracy of distributional approaches. Limitations of previous evaluations of similarity measures in the biomedical domain include their focus on the SNOMED CT ontology, and their reliance on small benchmarks not powered to detect significant differences between measure accuracy. There have been few evaluations of the relative performance of these measures on other biomedical knowledge sources such as the UMLS, and on larger, recently developed semantic similarity benchmarks. Results: We evaluated knowledge based and corpus IC based semantic similarity measures derived from SNOMED CT, MeSH, and the UMLS on recently developed semantic similarity benchmarks. Semantic similarity measures based on the UMLS, which contains SNOMED CT and MeSH, significantly outperformed those based solely on SNOMED CT or MeSH across evaluations. Intrinsic IC based measures significantly outperformed path-based and distributional measures. We released all code required to reproduce our results and all tools developed as part of this study as open source, available under http://code.google.com/p/ytex. We provide a publicly-accessible web service to compute semantic similarity, available under http://informatics.med.yale.edu/ytex.web/. Conclusions: Knowledge based semantic similarity measures are more practical to compute than distributional measures, as they do not require an external corpus. Furthermore, knowledge based measures significantly and meaningfully outperformed distributional measures on large semantic similarity benchmarks, suggesting that they are a practical alternative to distributional measures. Future evaluations of semantic similarity measures should utilize benchmarks powered to detect significant differences in measure accuracy.

89 citations


Network Information
Related Topics (5)
Web page
50.3K papers, 975.1K citations
84% related
Graph (abstract data type)
69.9K papers, 1.2M citations
84% related
Unsupervised learning
22.7K papers, 1M citations
83% related
Feature vector
48.8K papers, 954.4K citations
83% related
Web service
57.6K papers, 989K citations
82% related
Performance
Metrics
No. of papers in the topic in previous years
YearPapers
2023202
2022522
2021641
2020837
2019866
2018787