scispace - formally typeset
Search or ask a question
Topic

Semantic similarity

About: Semantic similarity is a research topic. Over the lifetime, 14605 publications have been published within this topic receiving 364659 citations. The topic is also known as: semantic relatedness.


Papers
More filters
Journal ArticleDOI
TL;DR: A novel method based on interdependent representations of short texts for determining their degree of semantic similarity and a preprocessing algorithm that chains coreferential named entities together and performs word segmentation to preserve the meaning of phrasal verbs and idioms are presented.
Abstract: We present a novel method based on interdependent representations of short texts for determining their degree of semantic similarity. The method represents each short text as two dense vectors: the former is built using the word-to-word similarity based on pre-trained word vectors, the latter is built using the word-to-word similarity based on external sources of knowledge. We also developed a preprocessing algorithm that chains coreferential named entities together and performs word segmentation to preserve the meaning of phrasal verbs and idioms. We evaluated the proposed method on three popular datasets, namely Microsoft Research Paraphrase Corpus, STS2015 and P4PIN, and obtained state-of-the-art results on all three without using prior knowledge of natural language, e.g., part-of-speech tags or parse tree, which indicates the interdependent representations of short text pairs are effective and efficient for semantic textual similarity tasks.

73 citations

Journal ArticleDOI
TL;DR: It is demonstrated that a single common-sense ontology produces plausible interpretations at all levels from parsing through reasoning, and some of the problems and tradeoffs for a method which has just one content ontology are explored.
Abstract: This paper defends the choice of a linguistically-based content ontology for natural language processing and demonstrates that a single common-sense ontology produces plausible interpretations at all levels from parsing through reasoning. The paper explores some of the problems and tradeoffs for a method which has just one content ontology. A linguistically-based content ontology represents the "world view" encoded in natural language. The content ontology (as opposed to the formal semantic ontology which distinguishes events from propositions, and so on) is best grounded in the culture, rather than in the world itself, or in the mind. By "world view" we mean naive assumptions about "what there is" in the world, and how it should be classified. These assumptions are time-worn and reflected in language at several levels: morphology, syntax and lexical semantics. The content ontology presented in the paper is part of a Naive Semantic lexicon, Naive Semantics is a lexical theory in which associated with each word sense is a naive theory (or set of beliefs) about the objects or events of reference. While naive semantic representations are not combinations of a closed set of primitives, they are also limited by a shallowness assumption. Included is just the information required to form a semantic interpretation incrementally, not all of the information known about objects. The Naive Semantic ontology is based upon a particular language, its syntax and its word senses. To the extent that other languages codify similar world views, we predict that their ontologies are similar. Applied in a computational natural language understanding system, this linguistically-motivated ontology (along with other native semantic information) is sufficient to disambiguate words, disambiguate syntactic structure, disambiguate formal semantic representations, resolve anaphoric expressions and perform reasoning tasks with text.

73 citations

Journal ArticleDOI
TL;DR: This work puts forward the hypothesis that the IAT provides a general measure of similarity, and provides further evidence for this in a new study in which the outcome of an IAT depended on whether the perceptual or functional characteristics of the stimuli were made salient.
Abstract: The Implicit Association Test (IAT) is widely used as a measure of semantic similarity (i.e., associations in semantic memory). The results of previous research and of a new study show that IAT effects can, however, also be based on other types of similarity between stimuli. We therefore put forward the hypothesis that the IAT provides a general measure of similarity. Given that similarity is highly dynamic and context-dependent, our view that the IAT measures similarity is compatible with existing evidence showing that IAT effects are highly malleable. We provide further evidence for this in a new study in which the outcome of an IAT depended on whether the perceptual or functional characteristics of the stimuli were made salient.

73 citations

Book ChapterDOI
11 Oct 2015
TL;DR: This paper shows that semantic relatedness can also be accurately computed by analysing only the graph structure of the knowledge base, and proposes a joint approach to entity and word-sense disambiguation that makes use of graph-based relatedness.
Abstract: Semantic relatedness and disambiguation are fundamental problems for linking text documents to the Web of Data. There are many approaches dealing with both problems but most of them rely on word or concept distribution over Wikipedia. They are therefore not applicable to concepts that do not have a rich textual description. In this paper, we show that semantic relatedness can also be accurately computed by analysing only the graph structure of the knowledge base. In addition, we propose a joint approach to entity and word-sense disambiguation that makes use of graph-based relatedness. As opposed to the majority of state-of-the-art systems that target mainly named entities, we use our approach to disambiguate both entities and common nouns. In our experiments, we first validate our relatedness measure on multiple knowledge bases and ground truth datasets and show that it performs better than related state-of-the-art graph based measures. Afterwards, we evaluate the disambiguation algorithm and show that it also achieves superior disambiguation accuracy with respect to alternative state-of-the-art graph-based algorithms.

73 citations

Proceedings Article
01 Jun 1998
TL;DR: An operational denition of corpus similarity is presented which addresses or circumvents the problems, using purposebuilt sets of iknown-similarity corporai, and three variants of the information theoretic measure ‘perplexity’ are evaluated.
Abstract: How similar are two corpora? A measure of corpus similarity would be very useful for NLP for many purposes, such as estimating the work involved in porting a system from one domain to another. First, we discuss difculties in identifying what we mean by ‘corpus similarity’: human similarity judgements are not negrained enough, corpus similarity is inherently multidimensional, and similarity can only be interpreted in the light of corpus homogeneity. We then present an operational denition of corpus similarity which addresses or circumvents the problems, using purposebuilt sets of iknown-similarity corporai. These KSC sets can be used to evaluate the measures. We evaluate the measures described in the literature, including three variants of the information theoretic measure ‘perplexity’. A -based measure, using word frequencies, is shown to be the best of those tested.

73 citations


Network Information
Related Topics (5)
Web page
50.3K papers, 975.1K citations
84% related
Graph (abstract data type)
69.9K papers, 1.2M citations
84% related
Unsupervised learning
22.7K papers, 1M citations
83% related
Feature vector
48.8K papers, 954.4K citations
83% related
Web service
57.6K papers, 989K citations
82% related
Performance
Metrics
No. of papers in the topic in previous years
YearPapers
2023202
2022522
2021641
2020837
2019866
2018787