scispace - formally typeset
Search or ask a question
Topic

Semantic similarity

About: Semantic similarity is a research topic. Over the lifetime, 14605 publications have been published within this topic receiving 364659 citations. The topic is also known as: semantic relatedness.


Papers
More filters
Journal ArticleDOI
TL;DR: This work presents evidence drawn from both normal observers and from a patient that the effect of structural and semantic similarity between objects on picture naming should be confined to the process of accessing semantic knowledge.
Abstract: The naming of pictures is typically thought to require sequential access to stored structural knowledge about objects, to semantic knowledge, and to a stored phonological description. Access to these different types of knowledge may constitute discrete processing stages; alternatively, it may be that information is transmitted continuously (in cascade) from one type of description to the next. The discrete stage and the cascade accounts make different predictions about the effects of structural and semantic similarity between objects on picture naming. The discrete stage account maintains that the effects of structural similarity should be confined to the process of accessing an object's structural description, and the effects of semantic similarity should be confined to the process of accessing semantic knowledge. The cascade account predicts that the effect of both variables may be passed on to subsequent processing stages. We present evidence drawn from both normal observers and from a patient...

659 citations

Journal ArticleDOI
Atanas Kiryakov1, Borislav Popov1, Ivan Terziev1, Dimitar Manov1, Damyan Ognyanoff1 
TL;DR: This paper presents a semantically enhanced information extraction system, which provides automatic semantic annotation with references to classes in the ontology and to instances and argues that such large-scale, fully automatic methods are essential for the transformation of the current largely textual web into a Semantic Web.

651 citations

Proceedings ArticleDOI
01 May 1988
TL;DR: Initial tests find this completely automatic method widely applicable and a promising way to improve users' access to many kinds of textual materials, or to objects and services for which textual descriptions are available.
Abstract: This paper describes a new approach for dealing with the vocabulary problem in human-computer interaction. Most approaches to retrieving textual materials depend on a lexical match between words in users' requests and those in or assigned to database objects. Because of the tremendous diversity in the words people use to describe the same object, lexical matching methods are necessarily incomplete and imprecise [5]. The latent semantic indexing approach tries to overcome these problems by automatically organizing text objects into a semantic structure more appropriate for matching user requests. This is done by taking advantage of implicit higher-order structure in the association of terms with text objects. The particular technique used is singular-value decomposition, in which a large term by text-object matrix is decomposed into a set of about 50 to 150 orthogonal factors from which the original matrix can be approximated by linear combination. Terms and objects are represented by 50 to 150 dimensional vectors and matched against user queries in this “semantic” space. Initial tests find this completely automatic method widely applicable and a promising way to improve users' access to many kinds of textual materials, or to objects and services for which textual descriptions are available.

638 citations

Proceedings Article
22 Aug 2004
TL;DR: A wholly intrinsic measure of Information Content that relies on hierarchical structure alone is presented, which is consequently easier to calculate, yet when used as the basis of a similarity mechanism it yields judgments that correlate more closely with human assessments than other, extrinsic measures of IC that additionally employ corpus analysis.
Abstract: Information Content (IC) is an important dimension of word knowledge when assessing the similarity of two terms or word senses. The conventional way of measuring the IC of word senses is to combine knowledge of their hierarchical structure from an ontology like WordNet with statistics on their actual usage in text as derived from a large corpus. In this paper we present a wholly intrinsic measure of IC that relies on hierarchical structure alone. We report that this measure is consequently easier to calculate, yet when used as the basis of a similarity mechanism it yields judgments that correlate more closely with human assessments than other, extrinsic measures of IC that additionally employ corpus analysis.

619 citations

Proceedings Article
01 Oct 2013
TL;DR: A method to learn bilingual embeddings from a large unlabeled corpus, while utilizing MT word alignments to constrain translational equivalence is proposed, which significantly out-perform baselines in word semantic similarity.
Abstract: We introduce bilingual word embeddings: semantic embeddings associated across two languages in the context of neural language models. We propose a method to learn bilingual embeddings from a large unlabeled corpus, while utilizing MT word alignments to constrain translational equivalence. The new embeddings significantly out-perform baselines in word semantic similarity. A single semantic similarity feature induced with bilingual embeddings adds near half a BLEU point to the results of NIST08 Chinese-English machine translation task.

608 citations


Network Information
Related Topics (5)
Web page
50.3K papers, 975.1K citations
84% related
Graph (abstract data type)
69.9K papers, 1.2M citations
84% related
Unsupervised learning
22.7K papers, 1M citations
83% related
Feature vector
48.8K papers, 954.4K citations
83% related
Web service
57.6K papers, 989K citations
82% related
Performance
Metrics
No. of papers in the topic in previous years
YearPapers
2023202
2022522
2021641
2020837
2019866
2018787