scispace - formally typeset
Search or ask a question
Topic

Semantic similarity

About: Semantic similarity is a research topic. Over the lifetime, 14605 publications have been published within this topic receiving 364659 citations. The topic is also known as: semantic relatedness.


Papers
More filters
Proceedings ArticleDOI
13 Aug 2016
TL;DR: The authors proposed a label noise reduction in entity typing (LNR) task to identify correct type labels for training examples, given the set of candidate type labels obtained by distant supervision with a given type hierarchy.
Abstract: Current systems of fine-grained entity typing use distant supervision in conjunction with existing knowledge bases to assign categories (type labels) to entity mentions. However, the type labels so obtained from knowledge bases are often noisy (i.e., incorrect for the entity mention's local context). We define a new task, Label Noise Reduction in Entity Typing (LNR), to be the automatic identification of correct type labels (type-paths) for training examples, given the set of candidate type labels obtained by distant supervision with a given type hierarchy. The unknown type labels for individual entity mentions and the semantic similarity between entity types pose unique challenges for solving the LNR task. We propose a general framework, called PLE, to jointly embed entity mentions, text features and entity types into the same low-dimensional space where, in that space, objects whose types are semantically close have similar representations. Then we estimate the type-path for each training example in a top-down manner using the learned embeddings. We formulate a global objective for learning the embeddings from text corpora and knowledge bases, which adopts a novel margin-based loss that is robust to noisy labels and faithfully models type correlation derived from knowledge bases. Our experiments on three public typing datasets demonstrate the effectiveness and robustness of PLE, with an average of 25% improvement in accuracy compared to next best method.

114 citations

Journal ArticleDOI
TL;DR: An event-related functional magnetic resonance imaging (fMRI) experiment was determined how cosine similarity between fMRI response patterns to concrete words and pictures reflects semantic clustering and semantic distances between the represented entities within a single category.
Abstract: How verbal and nonverbal visuoperceptual input connects to semantic knowledge is a core question in visual and cognitive neuroscience, with significant clinical ramifications. In an event-related functional magnetic resonance imaging (fMRI) experiment we determined how cosine similarity between fMRI response patterns to concrete words and pictures reflects semantic clustering and semantic distances between the represented entities within a single category. Semantic clustering and semantic distances between 24 animate entities were derived from a concept-feature matrix based on feature generation by >1000 subjects. In the main fMRI study, 19 human subjects performed a property verification task with written words and pictures and a low-level control task. The univariate contrast between the semantic and the control task yielded extensive bilateral occipitotemporal activation from posterior cingulate to anteromedial temporal cortex. Entities belonging to a same semantic cluster elicited more similar fMRI activity patterns in left occipitotemporal cortex. When words and pictures were analyzed separately, the effect reached significance only for words. The semantic similarity effect for words was localized to left perirhinal cortex. According to a representational similarity analysis of left perirhinal responses, semantic distances between entities correlated inversely with cosine similarities between fMRI response patterns to written words. An independent replication study in 16 novel subjects confirmed these novel findings. Semantic similarity is reflected by similarity of functional topography at a fine-grained level in left perirhinal cortex. The word specificity excludes perceptually driven confounds as an explanation and is likely to be task dependent.

113 citations

Journal ArticleDOI
TL;DR: A novel method for automatic hypertext generation that is based on a technique called lexical chaining, a method for discovering sequences of related words in a text, and attempts to take into account the effects of synonymy and polysemy.
Abstract: Most current automatic hypertext generation systems rely on term repetition to calculate the relatedness of two documents. There are well-recognized problems with such approaches, most notably, a vulnerability to the effects of synonymy (many words for the same concept) and polysemy (many concepts for the same word). We propose a novel method for automatic hypertext generation that is based on a technique called lexical chaining, a method for discovering sequences of related words in a text. This method uses a more general notion of document relatedness, and attempts to take into account the effects of synonymy and polysemy. We also present the results of an empirical study designed to test this method in the context of a question answering task from a database of newspaper articles.

113 citations

Book ChapterDOI
11 Nov 2007
TL;DR: This research explores three SPARQL-based techniques to solveSemantic Web tasks that often require similarity measures, such as semantic data integration, ontology mapping, and Semantic Web service matchmaking.
Abstract: This research explores three SPARQL-based techniques to solve Semantic Web tasks that often require similarity measures, such as semantic data integration, ontology mapping, and Semantic Web service matchmaking. Our aim is to see how far it is possible to integrate customized similarity functions (CSF) into SPARQL to achieve good results for these tasks. Our first approach exploits virtual triples calling property functions to establish virtual relations among resources under comparison; the second approach uses extension functions to filter out resources that do not meet the requested similarity criteria; finally, our third technique applies new solution modifiers to post-process a SPARQL solution sequence. The semantics of the three approaches are formally elaborated and discussed. We close the paper with a demonstration of the usefulness of our iSPARQL framework in the context of a data integration and an ontology mapping experiment.

113 citations

Journal ArticleDOI
12 Jun 2012-PLOS ONE
TL;DR: AlignNemo is a new algorithm that, given the networks of two organisms, uncovers subnetworks of proteins that relate in biological function and topology of interactions that more closely fit the models of functional complexes proposed in the literature.
Abstract: Local network alignment is an important component of the analysis of protein-protein interaction networks that may lead to the identification of evolutionary related complexes. We present AlignNemo, a new algorithm that, given the networks of two organisms, uncovers subnetworks of proteins that relate in biological function and topology of interactions. The discovered conserved subnetworks have a general topology and need not to correspond to specific interaction patterns, so that they more closely fit the models of functional complexes proposed in the literature. The algorithm is able to handle sparse interaction data with an expansion process that at each step explores the local topology of the networks beyond the proteins directly interacting with the current solution. To assess the performance of AlignNemo, we ran a series of benchmarks using statistical measures as well as biological knowledge. Based on reference datasets of protein complexes, AlignNemo shows better performance than other methods in terms of both precision and recall. We show our solutions to be biologically sound using the concept of semantic similarity applied to Gene Ontology vocabularies. The binaries of AlignNemo and supplementary details about the algorithms and the experiments are available at: sourceforge.net/p/alignnemo.

113 citations


Network Information
Related Topics (5)
Web page
50.3K papers, 975.1K citations
84% related
Graph (abstract data type)
69.9K papers, 1.2M citations
84% related
Unsupervised learning
22.7K papers, 1M citations
83% related
Feature vector
48.8K papers, 954.4K citations
83% related
Web service
57.6K papers, 989K citations
82% related
Performance
Metrics
No. of papers in the topic in previous years
YearPapers
2023202
2022522
2021641
2020837
2019866
2018787