scispace - formally typeset
Search or ask a question
Topic

Semantic similarity

About: Semantic similarity is a research topic. Over the lifetime, 14605 publications have been published within this topic receiving 364659 citations. The topic is also known as: semantic relatedness.


Papers
More filters
Proceedings ArticleDOI
01 Jun 2019
TL;DR: A new deep unsupervised hashing model, called DistilHash, is proposed, which can learn a distilled data set, where data pairs have confident similarity signals and the semantic similarity labels assigned by the optimal Bayesian classifier can be potentially distilled.
Abstract: Due to storage and search efficiency, hashing has become significantly prevalent for nearest neighbor search. Particularly, deep hashing methods have greatly improved the search performance, typically under supervised scenarios. In contrast, unsupervised deep hashing models can hardly achieve satisfactory performance due to the lack of supervisory similarity signals. To address this problem, in this paper, we propose a new deep unsupervised hashing model, called DistilHash, which can learn a distilled data set, where data pairs have confident similarity signals. Specifically, we investigate the relationship between the initial but noisy similarity signals learned from local structures and the semantic similarity labels assigned by the optimal Bayesian classifier. We show that, under a mild assumption, some data pairs, of which labels are consistent with those assigned by the optimal Bayesian classifier, can be potentially distilled. With this understanding, we design a simple but effective method to distill data pairs automatically and further adopt a Bayesian learning framework to learn hashing functions from the distilled data set. Extensive experimental results on three widely used benchmark datasets demonstrate that our method achieves state-of-the-art search performance.

85 citations

Journal ArticleDOI
03 Sep 2014-PLOS ONE
TL;DR: This study suggests that semantic algorithms are becoming new tools for the social sciences, opening perspectives on survey responses that prevalent psychometric theory cannot explain.
Abstract: Some disciplines in the social sciences rely heavily on collecting survey responses to detect empirical relationships among variables. We explored whether these relationships were a priori predictable from the semantic properties of the survey items, using language processing algorithms which are now available as new research methods. Language processing algorithms were used to calculate the semantic similarity among all items in state-of-the-art surveys from Organisational Behaviour research. These surveys covered areas such as transformational leadership, work motivation and work outcomes. This information was used to explain and predict the response patterns from real subjects. Semantic algorithms explained 60–86% of the variance in the response patterns and allowed remarkably precise prediction of survey responses from humans, except in a personality test. Even the relationships between independent and their purported dependent variables were accurately predicted. This raises concern about the empirical nature of data collected through some surveys if results are already given a priori through the way subjects are being asked. Survey response patterns seem heavily determined by semantics. Language algorithms may suggest these prior to administering a survey. This study suggests that semantic algorithms are becoming new tools for the social sciences, opening perspectives on survey responses that prevalent psychometric theory cannot explain.

85 citations

Book ChapterDOI
TL;DR: A number of similarity measures are listed, some of which are not well known (such as the Monge-Kantorovich metric), or newly introduced (reflection metric), and a set constructions are given that have been used in the design of some similarity measures.
Abstract: This paper formulates properties of similarity measures. We list a number of similarity measures, some of which are not well known (such as the Monge-Kantorovich metric), or newly introduced (reflection metric), and give a set constructions that have been used in the design of some similarity measures.

85 citations

Posted Content
TL;DR: XGAN as discussed by the authors is a dual adversarial autoencoder that captures a shared representation of the common domain semantic content in an unsupervised way, while jointly learning the domain-to-domain image translations in both directions.
Abstract: Style transfer usually refers to the task of applying color and texture information from a specific style image to a given content image while preserving the structure of the latter. Here we tackle the more generic problem of semantic style transfer: given two unpaired collections of images, we aim to learn a mapping between the corpus-level style of each collection, while preserving semantic content shared across the two domains. We introduce XGAN ("Cross-GAN"), a dual adversarial autoencoder, which captures a shared representation of the common domain semantic content in an unsupervised way, while jointly learning the domain-to-domain image translations in both directions. We exploit ideas from the domain adaptation literature and define a semantic consistency loss which encourages the model to preserve semantics in the learned embedding space. We report promising qualitative results for the task of face-to-cartoon translation. The cartoon dataset, CartoonSet, we collected for this purpose is publicly available at this http URL as a new benchmark for semantic style transfer.

85 citations

Proceedings ArticleDOI
02 Jun 2001
TL;DR: Tests performed on vocabularies of four Algonquian languages indicate that the method is capable of discovering on average nearly 75% percent of cognates at 50% precision.
Abstract: I present a method of identifying cognates in the vocabularies of related languages. I show that a measure of phonetic similarity based on multivalued features performs better than "orthographic" measures, such as the Longest Common Subsequence Ratio (LCSR) or Dice's coefficient. I introduce a procedure for estimating semantic similarity of glosses that employs keyword selection and WordNet. Tests performed on vocabularies of four Algonquian languages indicate that the method is capable of discovering on average nearly 75% percent of cognates at 50% precision.

85 citations


Network Information
Related Topics (5)
Web page
50.3K papers, 975.1K citations
84% related
Graph (abstract data type)
69.9K papers, 1.2M citations
84% related
Unsupervised learning
22.7K papers, 1M citations
83% related
Feature vector
48.8K papers, 954.4K citations
83% related
Web service
57.6K papers, 989K citations
82% related
Performance
Metrics
No. of papers in the topic in previous years
YearPapers
2023202
2022522
2021641
2020837
2019866
2018787