Topic

Semantic similarity

About: Semantic similarity is a research topic. Over the lifetime, 14605 publications have been published within this topic receiving 364659 citations. The topic is also known as: semantic relatedness.

...read moreread less

Papers published on a yearly basis

1 / 2

Papers

PDF

Open Access

More filters

Journal Article•DOI•

Learning short-text semantic similarity with word embeddings and external knowledge sources

[...]

Hien T. Nguyen, Phuc H. Duong¹, Erik Cambria²•Institutions (2)

Ton Duc Thang University¹, Nanyang Technological University²

15 Oct 2019-Knowledge Based Systems

TL;DR: A novel method based on interdependent representations of short texts for determining their degree of semantic similarity and a preprocessing algorithm that chains coreferential named entities together and performs word segmentation to preserve the meaning of phrasal verbs and idioms are presented.

...read moreread less

Abstract: We present a novel method based on interdependent representations of short texts for determining their degree of semantic similarity. The method represents each short text as two dense vectors: the former is built using the word-to-word similarity based on pre-trained word vectors, the latter is built using the word-to-word similarity based on external sources of knowledge. We also developed a preprocessing algorithm that chains coreferential named entities together and performs word segmentation to preserve the meaning of phrasal verbs and idioms. We evaluated the proposed method on three popular datasets, namely Microsoft Research Paraphrase Corpus, STS2015 and P4PIN, and obtained state-of-the-art results on all three without using prior knowledge of natural language, e.g., part-of-speech tags or parse tree, which indicates the interdependent representations of short text pairs are effective and efficient for semantic textual similarity tasks.

...read moreread less

73 citations

Journal Article•DOI•

A linguistic ontology

[...]

Kathleen Dahlgren

01 Dec 1995-International Journal of Human-computer Studies \/ International Journal of Man-machine Studies

TL;DR: It is demonstrated that a single common-sense ontology produces plausible interpretations at all levels from parsing through reasoning, and some of the problems and tradeoffs for a method which has just one content ontology are explored.

...read moreread less

Abstract: This paper defends the choice of a linguistically-based content ontology for natural language processing and demonstrates that a single common-sense ontology produces plausible interpretations at all levels from parsing through reasoning. The paper explores some of the problems and tradeoffs for a method which has just one content ontology. A linguistically-based content ontology represents the "world view" encoded in natural language. The content ontology (as opposed to the formal semantic ontology which distinguishes events from propositions, and so on) is best grounded in the culture, rather than in the world itself, or in the mind. By "world view" we mean naive assumptions about "what there is" in the world, and how it should be classified. These assumptions are time-worn and reflected in language at several levels: morphology, syntax and lexical semantics. The content ontology presented in the paper is part of a Naive Semantic lexicon, Naive Semantics is a lexical theory in which associated with each word sense is a naive theory (or set of beliefs) about the objects or events of reference. While naive semantic representations are not combinations of a closed set of primitives, they are also limited by a shallowness assumption. Included is just the information required to form a semantic interpretation incrementally, not all of the information known about objects. The Naive Semantic ontology is based upon a particular language, its syntax and its word senses. To the extent that other languages codify similar world views, we predict that their ontologies are similar. Applied in a computational natural language understanding system, this linguistically-motivated ontology (along with other native semantic information) is sufficient to disambiguate words, disambiguate syntactic structure, disambiguate formal semantic representations, resolve anaphoric expressions and perform reasoning tasks with text.

...read moreread less

73 citations

Journal Article•DOI•

The implicit association test as a general measure of similarity.

[...]

Jan De Houwer¹, Tanja Geldof¹, Els De Bruycker¹•Institutions (1)

Ghent University¹

01 Dec 2005-Canadian Journal of Experimental Psychology

TL;DR: This work puts forward the hypothesis that the IAT provides a general measure of similarity, and provides further evidence for this in a new study in which the outcome of an IAT depended on whether the perceptual or functional characteristics of the stimuli were made salient.

...read moreread less

Abstract: The Implicit Association Test (IAT) is widely used as a measure of semantic similarity (i.e., associations in semantic memory). The results of previous research and of a new study show that IAT effects can, however, also be based on other types of similarity between stimuli. We therefore put forward the hypothesis that the IAT provides a general measure of similarity. Given that similarity is highly dynamic and context-dependent, our view that the IAT measures similarity is compatible with existing evidence showing that IAT effects are highly malleable. We provide further evidence for this in a new study in which the outcome of an IAT depended on whether the perceptual or functional characteristics of the stimuli were made salient.

...read moreread less

73 citations

Book Chapter•DOI•

Path-Based Semantic Relatedness on Linked Data and Its Use to Word and Entity Disambiguation

[...]

Ioana Hulpus¹, Narumol Prangnawarat¹, Conor Hayes¹•Institutions (1)

National University of Ireland, Galway¹

11 Oct 2015

TL;DR: This paper shows that semantic relatedness can also be accurately computed by analysing only the graph structure of the knowledge base, and proposes a joint approach to entity and word-sense disambiguation that makes use of graph-based relatedness.

...read moreread less

Abstract: Semantic relatedness and disambiguation are fundamental problems for linking text documents to the Web of Data. There are many approaches dealing with both problems but most of them rely on word or concept distribution over Wikipedia. They are therefore not applicable to concepts that do not have a rich textual description. In this paper, we show that semantic relatedness can also be accurately computed by analysing only the graph structure of the knowledge base. In addition, we propose a joint approach to entity and word-sense disambiguation that makes use of graph-based relatedness. As opposed to the majority of state-of-the-art systems that target mainly named entities, we use our approach to disambiguate both entities and common nouns. In our experiments, we first validate our relatedness measure on multiple knowledge bases and ground truth datasets and show that it performs better than related state-of-the-art graph based measures. Afterwards, we evaluate the disambiguation algorithm and show that it also achieves superior disambiguation accuracy with respect to alternative state-of-the-art graph-based algorithms.

...read moreread less

73 citations

Proceedings Article•

Measures for Corpus Similarity and Homogeneity

[...]

Adam Kilgarriff, Tony Rose

01 Jun 1998

TL;DR: An operational denition of corpus similarity is presented which addresses or circumvents the problems, using purposebuilt sets of iknown-similarity corporai, and three variants of the information theoretic measure ‘perplexity’ are evaluated.

...read moreread less

Abstract: How similar are two corpora? A measure of corpus similarity would be very useful for NLP for many purposes, such as estimating the work involved in porting a system from one domain to another. First, we discuss difculties in identifying what we mean by ‘corpus similarity’: human similarity judgements are not negrained enough, corpus similarity is inherently multidimensional, and similarity can only be interpreted in the light of corpus homogeneity. We then present an operational denition of corpus similarity which addresses or circumvents the problems, using purposebuilt sets of iknown-similarity corporai. These KSC sets can be used to evaluate the measures. We evaluate the measures described in the literature, including three variants of the information theoretic measure ‘perplexity’. A -based measure, using word frequencies, is shown to be the best of those tested.

...read moreread less

73 citations

Collapse

Network Information

Performance

Metrics

15,319

Papers

407,958

Citations

No. of papers in the topic in previous years
Year	Papers
2023	202
2022	522
2021	641
2020	837
2019	866
2018	787

Semantic similarity

Papers published on a yearly basis

Papers

Trending Questions (10)

Network Information

Related Topics (5)

Performance

Metrics