Topic
Semantic similarity
About: Semantic similarity is a research topic. Over the lifetime, 14605 publications have been published within this topic receiving 364659 citations. The topic is also known as: semantic relatedness.
Papers published on a yearly basis
Papers
More filters
••
TL;DR: A novel method based on interdependent representations of short texts for determining their degree of semantic similarity and a preprocessing algorithm that chains coreferential named entities together and performs word segmentation to preserve the meaning of phrasal verbs and idioms are presented.
Abstract: We present a novel method based on interdependent representations of short texts for determining their degree of semantic similarity. The method represents each short text as two dense vectors: the former is built using the word-to-word similarity based on pre-trained word vectors, the latter is built using the word-to-word similarity based on external sources of knowledge. We also developed a preprocessing algorithm that chains coreferential named entities together and performs word segmentation to preserve the meaning of phrasal verbs and idioms. We evaluated the proposed method on three popular datasets, namely Microsoft Research Paraphrase Corpus, STS2015 and P4PIN, and obtained state-of-the-art results on all three without using prior knowledge of natural language, e.g., part-of-speech tags or parse tree, which indicates the interdependent representations of short text pairs are effective and efficient for semantic textual similarity tasks.
73 citations
••
01 Dec 1995-International Journal of Human-computer Studies \/ International Journal of Man-machine Studies
TL;DR: It is demonstrated that a single common-sense ontology produces plausible interpretations at all levels from parsing through reasoning, and some of the problems and tradeoffs for a method which has just one content ontology are explored.
Abstract: This paper defends the choice of a linguistically-based content ontology for natural language processing and demonstrates that a single common-sense ontology produces plausible interpretations at all levels from parsing through reasoning. The paper explores some of the problems and tradeoffs for a method which has just one content ontology. A linguistically-based content ontology represents the "world view" encoded in natural language. The content ontology (as opposed to the formal semantic ontology which distinguishes events from propositions, and so on) is best grounded in the culture, rather than in the world itself, or in the mind. By "world view" we mean naive assumptions about "what there is" in the world, and how it should be classified. These assumptions are time-worn and reflected in language at several levels: morphology, syntax and lexical semantics. The content ontology presented in the paper is part of a Naive Semantic lexicon, Naive Semantics is a lexical theory in which associated with each word sense is a naive theory (or set of beliefs) about the objects or events of reference. While naive semantic representations are not combinations of a closed set of primitives, they are also limited by a shallowness assumption. Included is just the information required to form a semantic interpretation incrementally, not all of the information known about objects. The Naive Semantic ontology is based upon a particular language, its syntax and its word senses. To the extent that other languages codify similar world views, we predict that their ontologies are similar. Applied in a computational natural language understanding system, this linguistically-motivated ontology (along with other native semantic information) is sufficient to disambiguate words, disambiguate syntactic structure, disambiguate formal semantic representations, resolve anaphoric expressions and perform reasoning tasks with text.
73 citations
••
TL;DR: This work puts forward the hypothesis that the IAT provides a general measure of similarity, and provides further evidence for this in a new study in which the outcome of an IAT depended on whether the perceptual or functional characteristics of the stimuli were made salient.
Abstract: The Implicit Association Test (IAT) is widely used as a measure of semantic similarity (i.e., associations in semantic memory). The results of previous research and of a new study show that IAT effects can, however, also be based on other types of similarity between stimuli. We therefore put forward the hypothesis that the IAT provides a general measure of similarity. Given that similarity is highly dynamic and context-dependent, our view that the IAT measures similarity is compatible with existing evidence showing that IAT effects are highly malleable. We provide further evidence for this in a new study in which the outcome of an IAT depended on whether the perceptual or functional characteristics of the stimuli were made salient.
73 citations
••
11 Oct 2015TL;DR: This paper shows that semantic relatedness can also be accurately computed by analysing only the graph structure of the knowledge base, and proposes a joint approach to entity and word-sense disambiguation that makes use of graph-based relatedness.
Abstract: Semantic relatedness and disambiguation are fundamental problems for linking text documents to the Web of Data. There are many approaches dealing with both problems but most of them rely on word or concept distribution over Wikipedia. They are therefore not applicable to concepts that do not have a rich textual description. In this paper, we show that semantic relatedness can also be accurately computed by analysing only the graph structure of the knowledge base. In addition, we propose a joint approach to entity and word-sense disambiguation that makes use of graph-based relatedness. As opposed to the majority of state-of-the-art systems that target mainly named entities, we use our approach to disambiguate both entities and common nouns. In our experiments, we first validate our relatedness measure on multiple knowledge bases and ground truth datasets and show that it performs better than related state-of-the-art graph based measures. Afterwards, we evaluate the disambiguation algorithm and show that it also achieves superior disambiguation accuracy with respect to alternative state-of-the-art graph-based algorithms.
73 citations
•
01 Jun 1998
TL;DR: An operational denition of corpus similarity is presented which addresses or circumvents the problems, using purposebuilt sets of iknown-similarity corporai, and three variants of the information theoretic measure ‘perplexity’ are evaluated.
Abstract: How similar are two corpora? A measure of corpus similarity would be very useful for NLP for many purposes, such as estimating the work involved in porting a system from one domain to another. First, we discuss difculties in identifying what we mean by ‘corpus similarity’: human similarity judgements are not negrained enough, corpus similarity is inherently multidimensional, and similarity can only be interpreted in the light of corpus homogeneity. We then present an operational denition of corpus similarity which addresses or circumvents the problems, using purposebuilt sets of iknown-similarity corporai. These KSC sets can be used to evaluate the measures. We evaluate the measures described in the literature, including three variants of the information theoretic measure ‘perplexity’. A -based measure, using word frequencies, is shown to be the best of those tested.
73 citations