Home
/
Topics
/
Semantic similarity

Topic

Semantic similarity

About: Semantic similarity is a research topic. Over the lifetime, 14605 publications have been published within this topic receiving 364659 citations. The topic is also known as: semantic relatedness.

...read moreread less

Papers published on a yearly basis

2023
2022
2021
2020
2019
2018
2017
2016
2015
2014
2013
2012
2011
2010
2009
2008
2007
2006
2005
2004
2003
2002
2001
2000
1999
1998
1997
1996
1995
1994
1993
1992
1991
1990
1989
1988
1987
1986
1985
1984
1983
1982
1981
1980
1979
1978
1977
1976
1975
1974
1973
1972
1971
1970
1969

1 / 2

Papers

PDF

Open Access

More filters

Book•

The meaning of the sentence in its semantic and pragmatic aspects

[...]

Petr Sgall, Eva Hajičová, Jarmila Panevová

01 Jan 1986

721 citations

Proceedings Article•

Extended gloss overlaps as a measure of semantic relatedness

[...]

Satanjeev Banerjee¹, Ted Pedersen²•Institutions (2)

Carnegie Mellon University¹, University of Minnesota²

09 Aug 2003

TL;DR: This article presented a new measure of semantic relatedness between concepts based on the number of shared words (overlaps) in their definitions (glosses), which is unique in that it extends the glosses of the concepts under consideration to include the glosss of other concepts to which they are related according to a given concept hierarchy.

...read moreread less

Abstract: This paper presents a new measure of semantic relatedness between concepts that is based on the number of shared words (overlaps) in their definitions (glosses). This measure is unique in that it extends the glosses of the concepts under consideration to include the glosses of other concepts to which they are related according to a given concept hierarchy. We show that this new measure reasonably correlates to human judgments. We introduce a new method of word sense disambiguation based on extended gloss overlaps, and demonstrate that it fares well on the SENSEVAL-2 lexical sample data.

...read moreread less

720 citations

Journal Article•DOI•

A Survey of Text Similarity Approaches

[...]

Wael Hassan Gomaa, Aly A. Fahmy

18 Apr 2013-International Journal of Computer Applications

TL;DR: This survey discusses the existing works on text similarity through partitioning them into three approaches; String-based, Corpus-based and Knowledge-based similarities, and samples of combination between these similarities are presented.

...read moreread less

Abstract: Measuring the similarity between words, sentences, paragraphs and documents is an important component in various tasks such as information retrieval, document clustering, word-sense disambiguation, automatic essay scoring, short answer grading, machine translation and text summarization. This survey discusses the existing works on text similarity through partitioning them into three approaches; String-based, Corpus-based and Knowledge-based similarities. Furthermore, samples of combination between these similarities are presented. General Terms Text Mining, Natural Language Processing. Keywords BasedText Similarity, Semantic Similarity, String-Based Similarity, Corpus-Based Similarity, Knowledge-Based Similarity. NeedlemanWunsch 1. INTRODUCTION Text similarity measures play an increasingly important role in text related research and applications in tasks Nsuch as information retrieval, text classification, document clustering, topic detection, topic tracking, questions generation, question answering, essay scoring, short answer scoring, machine translation, text summarization and others. Finding similarity between words is a fundamental part of text similarity which is then used as a primary stage for sentence, paragraph and document similarities. Words can be similar in two ways lexically and semantically. Words are similar lexically if they have a similar character sequence. Words are similar semantically if they have the same thing, are opposite of each other, used in the same way, used in the same context and one is a type of another. DistanceLexical similarity is introduced in this survey though different String-Based algorithms, Semantic similarity is introduced through Corpus-Based and Knowledge-Based algorithms. String-Based measures operate on string sequences and character composition. A string metric is a metric that measures similarity or dissimilarity (distance) between two text strings for approximate string matching or comparison. Corpus-Based similarity is a semantic similarity measure that determines the similarity between words according to information gained from large corpora. Knowledge-Based similarity is a semantic similarity measure that determines the degree of similarity between words using information derived from semantic networks. The most popular for each type will be presented briefly. This paper is organized as follows: Section two presents String-Based algorithms by partitioning them into two types character-based and term-based measures. Sections three and four introduce Corpus-Based and knowledge-Based algorithms respectively. Samples of combinations between similarity algorithms are introduced in section five and finally section six presents conclusion of the survey.

...read moreread less

718 citations

Proceedings Article•DOI•

Learning semantic representations using convolutional neural networks for web search

[...]

Yelong Shen¹, Xiaodong He², Jianfeng Gao², Li Deng², Grégoire Mesnil³ - Show less +1 more•Institutions (3)

Kent State University¹, Microsoft², Université de Montréal³

07 Apr 2014

TL;DR: This paper presents a series of new latent semantic models based on a convolutional neural network to learn low-dimensional semantic vectors for search queries and Web documents that significantly outperforms other se-mantic models in retrieval performance.

...read moreread less

Abstract: This paper presents a series of new latent semantic models based on a convolutional neural network (CNN) to learn low-dimensional semantic vectors for search queries and Web documents. By using the convolution-max pooling operation, local contextual information at the word n-gram level is modeled first. Then, salient local fea-tures in a word sequence are combined to form a global feature vector. Finally, the high-level semantic information of the word sequence is extracted to form a global vector representation. The proposed models are trained on clickthrough data by maximizing the conditional likelihood of clicked documents given a query, us-ing stochastic gradient ascent. The new models are evaluated on a Web document ranking task using a large-scale, real-world data set. Results show that our model significantly outperforms other se-mantic models, which were state-of-the-art in retrieval performance prior to this work.

...read moreread less

706 citations

Proceedings Article•DOI•

Similarity indexing with the SS-tree

[...]

David A. White¹, Ramesh Jain¹•Institutions (1)

University of California, San Diego¹

26 Feb 1996

TL;DR: This work describes the fundamental types of "similarity queries" that should be supported and proposes a new dynamic structure for similarity indexing called the similarity search tree or SS-tree, which performs better than the R*-tree in nearly every test.

...read moreread less

Abstract: Efficient indexing of high dimensional feature vectors is important to allow visual information systems and a number other applications to scale up to large databases. We define this problem as "similarity indexing" and describe the fundamental types of "similarity queries" that we believe should be supported. We also propose a new dynamic structure for similarity indexing called the similarity search tree or SS-tree. In nearly every test we performed on high dimensional data, we found that this structure performed better than the R*-tree. Our tests also show that the SS-tree is much better suited for approximate queries than the R*-tree.

...read moreread less

697 citations

1
2
3
4
5
6
7
8
9
10
…
11
12
13
14
15
16
17
…
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200

Collapse

Network Information

Performance

Metrics

15,319

Papers

407,958

Citations

No. of papers in the topic in previous years
Year	Papers
2023	202
2022	522
2021	641
2020	837
2019	866
2018	787

Semantic similarity

Papers published on a yearly basis

Papers

Trending Questions (10)

Network Information

Related Topics (5)

Performance

Metrics