scispace - formally typeset
Search or ask a question
Topic

Semantic similarity

About: Semantic similarity is a research topic. Over the lifetime, 14605 publications have been published within this topic receiving 364659 citations. The topic is also known as: semantic relatedness.


Papers
More filters
Proceedings ArticleDOI
07 Nov 2007
TL;DR: This paper proposes a data preprocessing model to add semantic information to trajectories in order to facilitate trajectory data analysis in different application domains and shows that the query complexity for the semantic analysis of trajectories will be significantly reduced.
Abstract: The collection of moving object data is becoming more and more common, and therefore there is an increasing need for the efficient analysis and knowledge extraction of these data in different application domains. Trajectory data are normally available as sample points, and do not carry semantic information, which is of fundamental importance for the comprehension of these data. Therefore, the analysis of trajectory data becomes expensive from a computational point of view and complex from a user's perspective. Enriching trajectories with semantic geographical information may simplify queries, analysis, and mining of moving object data. In this paper we propose a data preprocessing model to add semantic information to trajectories in order to facilitate trajectory data analysis in different application domains. The model is generic enough to represent the important parts of trajectories that are relevant to the application, not being restricted to one specific application. We present an algorithm to compute the important parts and show that the query complexity for the semantic analysis of trajectories will be significantly reduced with the proposed model.

434 citations

Journal ArticleDOI
TL;DR: A novel image representation is presented that renders it possible to access natural scenes by local semantic description by using a perceptually plausible distance measure that leads to a high correlation between the human and the automatically obtained typicality ranking.
Abstract: In this paper, we present a novel image representation that renders it possible to access natural scenes by local semantic description. Our work is motivated by the continuing effort in content-based image retrieval to extract and to model the semantic content of images. The basic idea of the semantic modeling is to classify local image regions into semantic concept classes such as water, rocks, or foliage. Images are represented through the frequency of occurrence of these local concepts. Through extensive experiments, we demonstrate that the image representation is well suited for modeling the semantic content of heterogenous scene categories, and thus for categorization and retrieval. The image representation also allows us to rank natural scenes according to their semantic similarity relative to certain scene categories. Based on human ranking data, we learn a perceptually plausible distance measure that leads to a high correlation between the human and the automatically obtained typicality ranking. This result is especially valuable for content-based image retrieval where the goal is to present retrieval results in descending semantic similarity from the query.

433 citations

Proceedings Article
01 Jun 2007
TL;DR: This work introduces a general framework for answer extraction which exploits semantic role annotations in the FrameNet paradigm and views semantic role assignment as an optimization problem in a bipartite graph and answer extraction as an instance of graph matching.
Abstract: Shallow semantic parsing, the automatic identification and labeling of sentential constituents, has recently received much attention. Our work examines whether semantic role information is beneficial to question answering. We introduce a general framework for answer extraction which exploits semantic role annotations in the FrameNet paradigm. We view semantic role assignment as an optimization problem in a bipartite graph and answer extraction as an instance of graph matching. Experimental results on the TREC datasets demonstrate improvements over state-of-the-art models.

429 citations

Proceedings ArticleDOI
17 Oct 2015
TL;DR: This work proposes to go from word-level to text-level semantics by combining insights from methods based on external sources of semantic knowledge with word embeddings, and derives multiple types of meta-features from the comparison of the word vectors for short text pairs, and from the vector means of their respective word embedDings.
Abstract: Determining semantic similarity between texts is important in many tasks in information retrieval such as search, query suggestion, automatic summarization and image finding. Many approaches have been suggested, based on lexical matching, handcrafted patterns, syntactic parse trees, external sources of structured semantic knowledge and distributional semantics. However, lexical features, like string matching, do not capture semantic similarity beyond a trivial level. Furthermore, handcrafted patterns and external sources of structured semantic knowledge cannot be assumed to be available in all circumstances and for all domains. Lastly, approaches depending on parse trees are restricted to syntactically well-formed texts, typically of one sentence in length. We investigate whether determining short text similarity is possible using only semantic features---where by semantic we mean, pertaining to a representation of meaning---rather than relying on similarity in lexical or syntactic representations. We use word embeddings, vector representations of terms, computed from unlabelled data, that represent terms in a semantic space in which proximity of vectors can be interpreted as semantic similarity. We propose to go from word-level to text-level semantics by combining insights from methods based on external sources of semantic knowledge with word embeddings. A novel feature of our approach is that an arbitrary number of word embedding sets can be incorporated. We derive multiple types of meta-features from the comparison of the word vectors for short text pairs, and from the vector means of their respective word embeddings. The features representing labelled short text pairs are used to train a supervised learning algorithm. We use the trained model at testing time to predict the semantic similarity of new, unlabelled pairs of short texts We show on a publicly available evaluation set commonly used for the task of semantic similarity that our method outperforms baseline methods that work under the same conditions.

426 citations

Posted Content
TL;DR: The authors proposed sentence-BERT (SBERT), a modification of the pretrained BERT network that uses siamese and triplet network structures to derive semantically meaningful sentence embeddings that can be compared using cosine-similarity.
Abstract: BERT (Devlin et al., 2018) and RoBERTa (Liu et al., 2019) has set a new state-of-the-art performance on sentence-pair regression tasks like semantic textual similarity (STS). However, it requires that both sentences are fed into the network, which causes a massive computational overhead: Finding the most similar pair in a collection of 10,000 sentences requires about 50 million inference computations (~65 hours) with BERT. The construction of BERT makes it unsuitable for semantic similarity search as well as for unsupervised tasks like clustering. In this publication, we present Sentence-BERT (SBERT), a modification of the pretrained BERT network that use siamese and triplet network structures to derive semantically meaningful sentence embeddings that can be compared using cosine-similarity. This reduces the effort for finding the most similar pair from 65 hours with BERT / RoBERTa to about 5 seconds with SBERT, while maintaining the accuracy from BERT. We evaluate SBERT and SRoBERTa on common STS tasks and transfer learning tasks, where it outperforms other state-of-the-art sentence embeddings methods.

425 citations


Network Information
Related Topics (5)
Web page
50.3K papers, 975.1K citations
84% related
Graph (abstract data type)
69.9K papers, 1.2M citations
84% related
Unsupervised learning
22.7K papers, 1M citations
83% related
Feature vector
48.8K papers, 954.4K citations
83% related
Web service
57.6K papers, 989K citations
82% related
Performance
Metrics
No. of papers in the topic in previous years
YearPapers
2023202
2022522
2021641
2020837
2019866
2018787