scispace - formally typeset
Search or ask a question
Topic

Semantic similarity

About: Semantic similarity is a research topic. Over the lifetime, 14605 publications have been published within this topic receiving 364659 citations. The topic is also known as: semantic relatedness.


Papers
More filters
Journal ArticleDOI
TL;DR: This paper proposes extending IC-based similarity measures by considering multiple ontology in an integrated way and shows an improvement in the similarity assessment accuracy when multiple ontologies are considered.
Abstract: The quantification of the semantic similarity between terms is an important research area that configures a valuable tool for text understanding. Among the different paradigms used by related works to compute semantic similarity, in recent years, information theoretic approaches have shown promising results by computing the information content (IC) of concepts from the knowledge provided by ontologies. These approaches, however, are hampered by the coverage offered by the single input ontology. In this paper, we propose extending IC-based similarity measures by considering multiple ontologies in an integrated way. Several strategies are proposed according to which ontology the evaluated terms belong. Our proposal has been evaluated by means of a widely used benchmark of medical terms and MeSH and SNOMED CT as ontologies. Results show an improvement in the similarity assessment accuracy when multiple ontologies are considered.

86 citations

Proceedings ArticleDOI
31 Jul 2018
TL;DR: This paper presented an effective approach for parallel corpus mining using bilingual sentence embeddings, which is achieved using a novel training method that introduces hard negatives consisting of sentences that are not translations but have some degree of semantic similarity.
Abstract: This paper presents an effective approach for parallel corpus mining using bilingual sentence embeddings. Our embedding models are trained to produce similar representations exclusively for bilingual sentence pairs that are translations of each other. This is achieved using a novel training method that introduces hard negatives consisting of sentences that are not translations but have some degree of semantic similarity. The quality of the resulting embeddings are evaluated on parallel corpus reconstruction and by assessing machine translation systems trained on gold vs. mined sentence pairs. We find that the sentence embeddings can be used to reconstruct the United Nations Parallel Corpus (Ziemski et al., 2016) at the sentence-level with a precision of 48.9% for en-fr and 54.9% for en-es. When adapted to document-level matching, we achieve a parallel document matching accuracy that is comparable to the significantly more computationally intensive approach of Uszkoreit et al. (2010). Using reconstructed parallel data, we are able to train NMT models that perform nearly as well as models trained on the original data (within 1-2 BLEU).

86 citations

Journal ArticleDOI
TL;DR: The proposed general method to generate temporal semantic annotation of a semantic relation between entities by constructing its connection entities, lexical syntactic patterns, context sentences, context graph, and context communities is proposed.

86 citations

Book ChapterDOI
26 Mar 2008
TL;DR: A comparative study of STASIS and LSA is described, which shows measures of semantic similarity can be applied to short texts for use in Conversational Agents (CAs), and a benchmark data set of 65 sentence pairs with human-derived similarity ratings is presented.
Abstract: This paper describes a comparative study of STASIS and LSA These measures of semantic similarity can be applied to short texts for use in Conversational Agents (CAs) CAs are computer programs that interact with humans through natural language dialogue Business organizations have spent large sums of money in recent years developing them for online customer selfservice, but achievements have been limited to simple FAQ systems We believe this is due to the labour-intensive process of scripting, which could be reduced radically by the use of short-text semantic similarity measures "Short texts" are typically 10-20 words long but are not required to be grammatically correct sentences, for example spoken utterances and text messages We also present a benchmark data set of 65 sentence pairs with human-derived similarity ratings This data set is the first of its kind, specifically developed to evaluate such measures and we believe it will be valuable to future researchers

86 citations

Journal ArticleDOI
TL;DR: This work proposes SCSNED method for disambiguation based on semantic similarity between contextual words and informative words of entities in KGs, and proposes a Category2Vec embedding model based on joint learning of word and category embedding, in order to compute word-category similarity for entity disambigsuation.
Abstract: With the increasing popularity of large scale Knowledge Graph (KG)s, many applications such as semantic analysis, search and question answering need to link entity mentions in texts to entities in KGs. Because of the polysemy problem in natural language, entity disambiguation is thus a key problem in current research. Existing disambiguation methods have considered entity prominence, context similarity and entity-entity relatedness to discriminate ambiguous entities, which are mainly working on document or paragraph level texts containing rich contextual information, and based on lexical matching for computing context similarity. When meeting short texts containing limited contextual information, such as web queries, questions and tweets, those conventional disambiguation methods are not good at handling single entity mention and measuring context similarity. In order to enhance the performance of disambiguation methods based on context similarity with such short texts, we propose SCSNED method for disambiguation based on semantic similarity between contextual words and informative words of entities in KGs. Specially, we exploit the effectiveness of both knowledge-based and corpus-based semantic similarity methods for entity disambiguation with SCSNED. Moreover, we propose a Category2Vec embedding model based on joint learning of word and category embedding, in order to compute word-category similarity for entity disambiguation. We show the effectiveness of these proposed methods with illustrative examples, and evaluate their effectiveness in a comparative experiment for entity disambiguation in real world web queries, questions and tweets. The experimental results have identified the effectiveness of different semantic similarity methods, and demonstrated the improvement of semantic similarity methods in SCSNED and Category2Vec over the conventional context similarity baseline. We further compare the proposed approaches with the state of the art entity disambiguation systems and show the performances of the proposed approaches are among the best performing systems. In addition, one important feature of the proposed approaches using semantic similarity, is the potential application on any existing KGs since they mainly use common features of entity descriptions and categories. Another contribution of the paper is an updated survey on background of entity disambiguation in KGs and semantic similarity methods.

86 citations


Network Information
Related Topics (5)
Web page
50.3K papers, 975.1K citations
84% related
Graph (abstract data type)
69.9K papers, 1.2M citations
84% related
Unsupervised learning
22.7K papers, 1M citations
83% related
Feature vector
48.8K papers, 954.4K citations
83% related
Web service
57.6K papers, 989K citations
82% related
Performance
Metrics
No. of papers in the topic in previous years
YearPapers
2023202
2022522
2021641
2020837
2019866
2018787