scispace - formally typeset
Search or ask a question
Topic

Semantic similarity

About: Semantic similarity is a research topic. Over the lifetime, 14605 publications have been published within this topic receiving 364659 citations. The topic is also known as: semantic relatedness.


Papers
More filters
Proceedings ArticleDOI
25 Oct 2010
TL;DR: This paper proposes a social image "retagging" scheme that aims at assigning images with better content descriptors and shows the remarkable performance improvements brought by retagging via two applications, i.e., tag-based search and automatic annotation.
Abstract: Online social media repositories such as Flickr and Zooomr allow users to manually annotate their images with freely-chosen tags, which are then used as indexing keywords to facilitate image search and other applications. However, these tags are frequently imprecise and incomplete, though they are provided by human beings, and many of them are almost only meaningful for the image owners (such as the name of a dog). Thus there is still a gap between these tags and the actual content of the images, and this significantly limits tag-based applications, such as search and browsing. To tackle this issue, this paper proposes a social image "retagging" scheme that aims at assigning images with better content descriptors. The refining process, including denoising and enriching, is formulated as an optimization framework based on the consistency between "visual similarity" and "semantic similarity" in social images, that is, the visually similar images tend to have similar semantic descriptors, and vice versa. An effective iterative bound optimization algorithm is applied to learn the improved tag assignment. In addition, as many tags are intrinsically not closely-related to the visual content of the images, we employ knowledge based method to differentiate visual content related tags from unrelated ones and then constrain the tagging vocabulary of our automatic algorithm within the content related tags. Finally, to improve the coverage of the tags, we further enrich the tag set with appropriate synonyms and hypernyms based on an external knowledge base. Experimental results on a Flickr image collection demonstrate the effectiveness of this approach. We will also show the remarkable performance improvements brought by retagging via two applications, i.e., tag-based search and automatic annotation.

116 citations

01 Jan 2002
TL;DR: This paper claims that location of web services should be based on the semantic match between a declarative description of the service being sought, and a description ofthe service being offered, and that this match is outside the representation capabilities of registries such as UDDI and languages such as WSDL.
Abstract: The Web is moving from being a collection of pages toward a collection of services that interoperate through the Internet. The first step toward this interoperation is the location of other services that can help toward the solution of a problem. In this paper we claim that location of web services should be based on the semantic match between a declarative description of the service being sought, and a description of the service being offered. Furthermore, we claim that this match is outside the representation capabilities of registries such as UDDI and languages such as WSDL.We propose a solution based on DAML-S, a DAML-based language for service description, and we show how service capabilities are presented in the Profile section of a DAML-S description and how a semantic match between advertisements and requests is performed.

115 citations

Proceedings ArticleDOI
01 Nov 2016
TL;DR: This article explored if prior work can be enhanced using semantic similarity/discordance between word embeddings, and augmented word embedding-based features to four feature sets reported in the past.
Abstract: This paper makes a simple increment to state-of-the-art in sarcasm detection research. Existing approaches are unable to capture subtle forms of context incongruity which lies at the heart of sarcasm. We explore if prior work can be enhanced using semantic similarity/discordance between word embeddings. We augment word embedding-based features to four feature sets reported in the past. We also experiment with four types of word embeddings. We observe an improvement in sarcasm detection, irrespective of the word embedding used or the original feature set to which our features are augmented. For example, this augmentation results in an improvement in F-score of around 4\% for three out of these four feature sets, and a minor degradation in case of the fourth, when Word2Vec embeddings are used. Finally, a comparison of the four embeddings shows that Word2Vec and dependency weight-based features outperform LSA and GloVe, in terms of their benefit to sarcasm detection.

115 citations

01 Jan 2005
TL;DR: A metric for measuring the similarity of semantic services annotated with OWL ontology, calculated by defining the intrinsic information value of a service description based on the “inferencibility” of each of OWL Lite constructs is proposed.
Abstract: Establishing the compatibility of services is an essential prerequisite to service composition. By formally defining the similarity of semantic services, useful information can be obtained about their compatibility. In this paper we propose a metric for measuring the similarity of semantic services annotated with OWL ontology. Similarity is calculated by defining the intrinsic information value of a service description based on the “inferencibility” of each of OWL Lite constructs. We apply this technique to OWL-S, an emerging standard for defining semantic service metadata and demonstrate how to measure the similarity of OWL-S annotated services.

115 citations

Posted Content
TL;DR: This article proposed relevance-based word embedding models that learn word representations based on query-document relevance information and classify each term as belonging to the relevant or non-relevant class for each query.
Abstract: Learning a high-dimensional dense representation for vocabulary terms, also known as a word embedding, has recently attracted much attention in natural language processing and information retrieval tasks. The embedding vectors are typically learned based on term proximity in a large corpus. This means that the objective in well-known word embedding algorithms, e.g., word2vec, is to accurately predict adjacent word(s) for a given word or context. However, this objective is not necessarily equivalent to the goal of many information retrieval (IR) tasks. The primary objective in various IR tasks is to capture relevance instead of term proximity, syntactic, or even semantic similarity. This is the motivation for developing unsupervised relevance-based word embedding models that learn word representations based on query-document relevance information. In this paper, we propose two learning models with different objective functions; one learns a relevance distribution over the vocabulary set for each query, and the other classifies each term as belonging to the relevant or non-relevant class for each query. To train our models, we used over six million unique queries and the top ranked documents retrieved in response to each query, which are assumed to be relevant to the query. We extrinsically evaluate our learned word representation models using two IR tasks: query expansion and query classification. Both query expansion experiments on four TREC collections and query classification experiments on the KDD Cup 2005 dataset suggest that the relevance-based word embedding models significantly outperform state-of-the-art proximity-based embedding models, such as word2vec and GloVe.

115 citations


Network Information
Related Topics (5)
Web page
50.3K papers, 975.1K citations
84% related
Graph (abstract data type)
69.9K papers, 1.2M citations
84% related
Unsupervised learning
22.7K papers, 1M citations
83% related
Feature vector
48.8K papers, 954.4K citations
83% related
Web service
57.6K papers, 989K citations
82% related
Performance
Metrics
No. of papers in the topic in previous years
YearPapers
2023202
2022522
2021641
2020837
2019866
2018787