Topic

Locality-sensitive hashing

About: Locality-sensitive hashing is a research topic. Over the lifetime, 1894 publications have been published within this topic receiving 69362 citations.

...read moreread less

Papers published on a yearly basis

Papers

PDF

Open Access

More filters

Proceedings Article•

Semantic topic multimodal hashing for cross-media retrieval

[...]

Di Wang¹, Xinbo Gao¹, Xiumei Wang¹, Lihuo He¹•Institutions (1)

Xidian University¹

25 Jul 2015

TL;DR: A novel Semantic Topic Multimodal Hashing (STMH) is developed by considering latent semantic information in coding procedure and demonstrates that the proposed method outperforms several state-of-the-art methods.

...read moreread less

Abstract: Multimodal hashing is essential to cross-media similarity search for its low storage cost and fast query speed. Most existing multimodal hashing methods embedded heterogeneous data into a common low-dimensional Hamming space, and then rounded the continuous embeddings to obtain the binary codes. Yet they usually neglect the inherent discrete nature of hashing for relaxing the discrete constraints, which will cause degraded retrieval performance especially for long codes. For this purpose, a novel Semantic Topic Multimodal Hashing (STMH) is developed by considering latent semantic information in coding procedure. It first discovers clustering patterns of texts and robust factorizes the matrix of images to obtain multiple semantic topics of texts and concepts of images. Then the learned multimodal semantic features are transformed into a common subspace by their correlations. Finally, each bit of unified hash code can be generated directly by figuring out whether a topic or concept is contained in a text or an image. Therefore, the obtained model by STMH is more suitable for hashing scheme as it directly learns discrete hash codes in the coding process. Experimental results demonstrate that the proposed method outperforms several state-of-the-art methods.

...read moreread less

170 citations

Journal Article•DOI•

Cross-View Retrieval via Probability-Based Semantics-Preserving Hashing

[...]

Zijia Lin¹, Guiguang Ding¹, Jungong Han², Jianmin Wang¹•Institutions (2)

Tsinghua University¹, Northumbria University²

20 Dec 2017-IEEE Transactions on Systems, Man, and Cybernetics

TL;DR: This paper proposes an effective probability-based semantics-preserving hashing method (SePH) method to tackle the problem of cross-view retrieval, and conducts extensive experiments on diverse benchmark datasets to evaluate the proposed SePH.

...read moreread less

Abstract: For efficiently retrieving nearest neighbors from large-scale multiview data, recently hashing methods are widely investigated, which can substantially improve query speeds. In this paper, we propose an effective probability-based semantics-preserving hashing (SePH) method to tackle the problem of cross-view retrieval. Considering the semantic consistency between views, SePH generates one unified hash code for all observed views of any instance. For training, SePH first transforms the given semantic affinities of training data into a probability distribution, and aims to approximate it with another one in Hamming space, via minimizing their Kullback–Leibler divergence. Specifically, the latter probability distribution is derived from all pair-wise Hamming distances between to-be-learnt hash codes of the training data. Then with learnt hash codes, any kind of predictive models like linear ridge regression, logistic regression, or kernel logistic regression, can be learnt as hash functions in each view for projecting the corresponding view-specific features into hash codes. As for out-of-sample extension, given any unseen instance, the learnt hash functions in its observed views can predict view-specific hash codes. Then by deriving or estimating the corresponding output probabilities with respect to the predicted view-specific hash codes, a novel probabilistic approach is further proposed to utilize them for determining a unified hash code. To evaluate the proposed SePH, we conduct extensive experiments on diverse benchmark datasets, and the experimental results demonstrate that SePH is reasonable and effective.

...read moreread less

169 citations

Proceedings Article•DOI•

Weakly-supervised hashing in kernel space

[...]

Yadong Mu¹, Jialie Shen², Shuicheng Yan¹•Institutions (2)

National University of Singapore¹, Singapore Management University²

13 Jun 2010

TL;DR: This paper proposes a supervised hashing method, i.e., the LAbel-regularized Max-margin Partition (LAMP) algorithm, which generates hash functions in weakly-supervised setting and provides a collision bound which is beyond pairwise data interaction based on Markov random fields theory.

...read moreread less

Abstract: The explosive growth of the vision data motivates the recent studies on efficient data indexing methods such as locality-sensitive hashing (LSH). Most existing approaches perform hashing in an unsupervised way. In this paper we move one step forward and propose a supervised hashing method, i.e., the LAbel-regularized Max-margin Partition (LAMP) algorithm. The proposed method generates hash functions in weakly-supervised setting, where a small portion of sample pairs are manually labeled to be “similar” or “dissimilar”. We formulate the task as a Constrained Convex-Concave Procedure (CCCP), which can be relaxed into a series of convex sub-problems solvable with efficient Quadratic-Program (QP). The proposed hashing method possesses other characteristics including: 1) most existing LSH approaches rely on linear feature representation. Unfortunately, kernel tricks are often more natural to gauge the similarity between visual objects in vision research, which corresponds to probably infinite-dimensional Hilbert spaces. The proposed LAMP has a natural support for kernel-based feature representation. 2) traditional hashing methods assume uniform data distributions. Typically, the collision probability of two samples in hash buckets is only determined by pairwise similarity, unrelated to contextual data distribution. In contrast, we provide such a collision bound which is beyond pairwise data interaction based on Markov random fields theory. Extensive empirical evaluations are conducted on five widely-used benchmarks. It takes only several seconds to generate a new hashing function, and the adopted random supporting-vector scheme enables the LAMP algorithm scalable to large-scale problems. Experimental results well validate the superiorities of the LAMP algorithm over the state-of-the-art kernel-based hashing methods.

...read moreread less

166 citations

Proceedings Article•DOI•

Scalable music recommendation by search

[...]

Rui Cai¹, Chao Zhang¹, Lei Zhang¹, Wei-Ying Ma¹•Institutions (1)

Microsoft¹

29 Sep 2007

TL;DR: This paper presents a search-based solution for scalable music recommendations, in which a music piece is first transformed to a music signature sequence in which each signature characterizes the timbre of a local music clip, using the locality sensitive hashing (LSH).

...read moreread less

Abstract: The growth of music resources on personal devices and Internet radio has increased the need for music recommendations. In this paper, aiming at providing an efficient and general solution, we present a search-based solution for scalable music recommendations. In this solution a music piece is first transformed to a music signature sequence in which each signature characterizes the timbre of a local music clip. Based on such signatures, a scale-sensitive method is then proposed to index the music pieces for similarity search, using the locality sensitive hashing (LSH). The scale-sensitive method can numerically find the appropriate parameters for indexing various scales of music collections, and thus can guarantee a proper number of nearest neighbors are found in search. In the recommendation stage, representative signatures from snippets of a seed piece are extracted as query terms, to retrieve pieces with similar melodies for suggestions. We also design a relevance-ranking function to sort the search results, based on the criteria that include matching ratio, temporal order, term weight, and matching confidence. Finally, with the search results, we propose a strategy to generate a dynamic playlist which can automatically expand with time. Evaluations of several music collections at various scales showed that our approach achieves encouraging results in terms of recommendation satisfaction and system scalability.

...read moreread less

165 citations

Proceedings Article•DOI•

Modeling LSH for performance tuning

[...]

Wei Dong¹, Zhe Wang¹, William Josephson¹, Moses Charikar¹, Kai Li¹ - Show less +1 more•Institutions (1)

Princeton University¹

26 Oct 2008

TL;DR: A statistical performance model of Multi-probe LSH, a state-of-the-art variance of LSH is presented, which can accurately predict the average search quality and latency given a small sample dataset and an adaptive LSH search algorithm is devised to determine the probing parameter dynamically for each query.

...read moreread less

Abstract: Although Locality-Sensitive Hashing (LSH) is a promising approach to similarity search in high-dimensional spaces, it has not been considered practical partly because its search quality is sensitive to several parameters that are quite data dependent. Previous research on LSH, though obtained interesting asymptotic results, provides little guidance on how these parameters should be chosen, and tuning parameters for a given dataset remains a tedious process.To address this problem, we present a statistical performance model of Multi-probe LSH, a state-of-the-art variance of LSH. Our model can accurately predict the average search quality and latency given a small sample dataset. Apart from automatic parameter tuning with the performance model, we also use the model to devise an adaptive LSH search algorithm to determine the probing parameter dynamically for each query. The adaptive probing method addresses the problem that even though the average performance is tuned for optimal, the variance of the performance is extremely high. We experimented with three different datasets including audio, images and 3D shapes to evaluate our methods. The results show the accuracy of the proposed model: the recall errors predicted are within 5% from the real values for most cases; the adaptive search method reduces the standard deviation of recall by about 50% over the existing method.

...read moreread less

164 citations

Collapse

Network Information

Performance

Metrics

2,048

Papers

77,891

Citations

No. of papers in the topic in previous years
Year	Papers
2023	43
2022	108
2021	88
2020	110
2019	104
2018	139

Locality-sensitive hashing

Papers published on a yearly basis

Papers

Trending Questions (10)

Network Information

Related Topics (5)

Performance

Metrics