Topic

Semantic similarity

About: Semantic similarity is a research topic. Over the lifetime, 14605 publications have been published within this topic receiving 364659 citations. The topic is also known as: semantic relatedness.

...read moreread less

Papers published on a yearly basis

1 / 2

Papers

PDF

Open Access

More filters

Proceedings Article•DOI•

VisualWord2Vec (Vis-W2V): Learning Visually Grounded Word Embeddings Using Abstract Scenes

[...]

Satwik Kottur¹, Ramakrishna Vedantam², Jose M. F. Moura¹, Devi Parikh²•Institutions (2)

Carnegie Mellon University¹, Virginia Tech²

01 Jun 2016

TL;DR: This paper proposed a model to learn visually grounded word embeddings (vis-w2v) to capture visual notions of semantic relatedness in the visual world, and found that visual grounding of words depends on semantics, and not the literal pixels.

...read moreread less

Abstract: We propose a model to learn visually grounded word embeddings (vis-w2v) to capture visual notions of semantic relatedness. While word embeddings trained using text have been extremely successful, they cannot uncover notions of semantic relatedness implicit in our visual world. For instance, although "eats" and "stares at" seem unrelated in text, they share semantics visually. When people are eating something, they also tend to stare at the food. Grounding diverse relations like "eats" and "stares at" into vision remains challenging, despite recent progress in vision. We note that the visual grounding of words depends on semantics, and not the literal pixels. We thus use abstract scenes created from clipart to provide the visual grounding. We find that the embeddings we learn capture fine-grained, visually grounded notions of semantic relatedness. We show improvements over text-only word embeddings (word2vec) on three tasks: common-sense assertion classification, visual paraphrasing and text-based image retrieval. Our code and datasets are available online.

...read moreread less

90 citations

Proceedings Article•DOI•

Language Models Based on Semantic Composition

[...]

Jeff Mitchell¹, Mirella Lapata¹•Institutions (1)

University of Edinburgh¹

06 Aug 2009

TL;DR: A novel statistical language model is proposed to capture long-range semantic dependencies by applying the concept of semantic composition to the problem of constructing predictive history representations for upcoming words.

...read moreread less

Abstract: In this paper we propose a novel statistical language model to capture long-range semantic dependencies Specifically, we apply the concept of semantic composition to the problem of constructing predictive history representations for upcoming words We also examine the influence of the underlying semantic space on the composition task by comparing spatial semantic representations against topic-based ones The composition models yield reductions in perplexity when combined with a standard n-gram language model over the n-gram model alone We also obtain perplexity reductions when integrating our models with a structured language model

...read moreread less

90 citations

Proceedings Article•DOI•

Generalized Semantic Preserving Hashing for N-Label Cross-Modal Retrieval

[...]

Devraj Mandal¹, Kunal N. Chaudhury¹, Soma Biswas¹•Institutions (1)

Indian Institute of Science¹

01 Jul 2017

TL;DR: This paper proposes a simple, yet effective generalized hashing framework which can work for all the different scenarios, while preserving the semantic distance between the data points, and learns the optimum hash codes for the two modalities simultaneously.

...read moreread less

Abstract: Due to availability of large amounts of multimedia data, cross-modal matching is gaining increasing importance. Hashing based techniques provide an attractive solution to this problem when the data size is large. Different scenarios of cross-modal matching are possible, for example, data from the different modalities can be associated with a single label or multiple labels, and in addition may or may not have one-to-one correspondence. Most of the existing approaches have been developed for the case where there is one-to-one correspondence between the data of the two modalities. In this paper, we propose a simple, yet effective generalized hashing framework which can work for all the different scenarios, while preserving the semantic distance between the data points. The approach first learns the optimum hash codes for the two modalities simultaneously, so as to preserve the semantic similarity between the data points, and then learns the hash functions to map from the features to the hash codes. Extensive experiments on single label dataset like Wiki and multi-label datasets like NUS-WIDE, Pascal and LabelMe under all the different scenarios and comparisons with the state-of-the-art shows the effectiveness of the proposed approach.

...read moreread less

90 citations

Proceedings Article•

Some features of the TAXIS data model

[...]

John Mylopoulos, Harry K. T. Wong

01 Oct 1980

TL;DR: A number of postulates that need to be satisfied by a database defined in forms of such features for its definition to be well-formed are presented.

...read moreread less

Abstract: This paper's principal goal is to provide a discussion on issues raised by the coexistence in a semantic data model of (i) An object-oriented framework including the notions of token, class and property as well as the IS-A and INSTANCE-OF relations; (ii) Transactions that can cause state changes; (iii) Special (null) values such as "unknown", "nothing" and "inconsistent". The paper presents a number of postulates that need to be satisfied by a database defined in forms of such features for its definition to be well-formed. The discussion uses as starting point TAXIS, a language for the design of interactive applications systems which offers all three types of features.

...read moreread less

90 citations

Journal Article•DOI•

[...]

Vijay Garla¹, Cynthia Brandt¹•Institutions (1)

Yale University¹

10 Oct 2012-BMC Bioinformatics

TL;DR: Semantic similarity measures based on the UMLS, which contains SNOMED CT and MeSH, significantly outperformed those based solely on SNOMed CT or MeSH across evaluations, suggesting that knowledge based semantic similarity measures are more practical to compute than distributional measures, as they do not require an external corpus.

...read moreread less

Abstract: Background: Semantic similarity measures estimate the similarity between concepts, and play an important role in many text processing tasks. Approaches to semantic similarity in the biomedical domain can be roughly divided into knowledge based and distributional based methods. Knowledge based approaches utilize knowledge sources such as dictionaries, taxonomies, and semantic networks, and include path finding measures and intrinsic information content (IC) measures. Distributional measures utilize, in addition to a knowledge source, the distribution of concepts within a corpus to compute similarity; these include corpus IC and context vector methods. Prior evaluations of these measures in the biomedical domain showed that distributional measures outperform knowledge based path finding methods; but more recent studies suggested that intrinsic IC based measures exceed the accuracy of distributional approaches. Limitations of previous evaluations of similarity measures in the biomedical domain include their focus on the SNOMED CT ontology, and their reliance on small benchmarks not powered to detect significant differences between measure accuracy. There have been few evaluations of the relative performance of these measures on other biomedical knowledge sources such as the UMLS, and on larger, recently developed semantic similarity benchmarks. Results: We evaluated knowledge based and corpus IC based semantic similarity measures derived from SNOMED CT, MeSH, and the UMLS on recently developed semantic similarity benchmarks. Semantic similarity measures based on the UMLS, which contains SNOMED CT and MeSH, significantly outperformed those based solely on SNOMED CT or MeSH across evaluations. Intrinsic IC based measures significantly outperformed path-based and distributional measures. We released all code required to reproduce our results and all tools developed as part of this study as open source, available under http://code.google.com/p/ytex. We provide a publicly-accessible web service to compute semantic similarity, available under http://informatics.med.yale.edu/ytex.web/. Conclusions: Knowledge based semantic similarity measures are more practical to compute than distributional measures, as they do not require an external corpus. Furthermore, knowledge based measures significantly and meaningfully outperformed distributional measures on large semantic similarity benchmarks, suggesting that they are a practical alternative to distributional measures. Future evaluations of semantic similarity measures should utilize benchmarks powered to detect significant differences in measure accuracy.

...read moreread less

89 citations

Collapse

Network Information

Performance

Metrics

15,319

Papers

407,958

Citations

No. of papers in the topic in previous years
Year	Papers
2023	202
2022	522
2021	641
2020	837
2019	866
2018	787

Semantic similarity

Papers published on a yearly basis

Papers

Trending Questions (10)

Network Information

Related Topics (5)

Performance

Metrics