What contributions have the authors mentioned in the paper "Supervised multi-scale locality sensitive hashing" ?

In this work, the authors propose to improve LSH by incorporating two elements – supervised hash bit selection and multi-scale feature representation. The authors first demonstrate that the feature scale can influence performance, which is often a neglected factor. Then the authors show that the proposed supervision method is effective. ∗L. Weng was supported by the French project Secular under grant ANR-12-CORD-0014. Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. The integrated scheme exhibits further improved performance.

What is the notion of bit reliability used for estimating the quality of each bit?

The notion of bit reliability is used for estimating the quality of each bit, which is defined as a weighted average of the false positive rate and the false negative rate in a hypothesis test during the training stage.

What scale is the for a feature-independent algorithm?

For each scale index si = s0 + i (i = 0, 1, · · · , x− 1), the authors evenly divide the feature vector into l = 2si parts and compute ko (ko ≥ k) hash bits from each part, so that n′ = l · ko.

How do the authors compute a feature vector?

In order to hash (project) a feature vector, there are at least two ways: the authors could either compute l × k bits from the whole feature vector, or divide it into l parts and compute k hash bits from each part.

What is the main idea of the work?

In this work, the authors improve the classic LSH framework by incorporating two novel elements: supervised hash bit selection and multi-scale feature representation.

What is the way to solve the problem?

Since LSH can support arbitrary hash lengths, their solution to the above problem is the following:• Given an LSH algorithm, generate no-bit output (no ≥ n), form a hash value with improved performance by selecting n bits out of no bits.

What is the reliability of a hash bit?

The reliability of a hash bit can be characterized by the false positive rate pfp and the false negative rate pfn:• pfp = Probability {di = 0

How many scales can be used to select a bit?

64 128 256 k 128,256 256,512 512,1024 l 1 x 1 2 3 1 2 3 1 2 30 500 1000 1500 2000 2500 3000 3500 4000 0.440.450.460.470.480.490.5indexbi t rel iabi litybit selection according to reliabilitybit (1st scale) bit (2nd scale) bit (3rd scale) bit (4th scale) selected bit (6.25%)Figure 4: Bit selection from four scales.

What is the ROC curve for a hash?

In Fig. 5a-b, when the number of scales increases from one to two, the ROC curves typically intersect at a certain middle point: on the left the performance is decreased and on the right the performance is increased.

(Open Access) Supervised Multi-scale Locality Sensitive Hashing (2015) | Li Weng

Q: What is the main advantage of the proposal?

The main advantage of the proposal is its versatility and thus the potential to be applied to other feature-independent hash algorithms.

Q: What are the two parameters that are important?

The two parameters l and k are important - the former typically corresponds to the number of hash tables; the latter is the size of a sub-hash value.

Q: How much is the cost of a pseudo-random number generator?

In the worst case, when all the candidate hyperplanes are generated online by a pseudo-random number generator and are all used for projection (despite that not all results are used), the cost is about x · k0k times the cost of the original LSH.

Edinburgh Research Explorer

Supervised Multi-scale Locality Sensitive Hashing

Citation for published version:

Weng, L, Jhuo, I-H, Shi, M, Sun, M, Cheng, W-H & Amsaleg, L 2015, Supervised Multi-scale Locality

Sensitive Hashing. in ICMR '15 Proceedings of the 5th ACM on International Conference on Multimedia

Retrieval. ACM, New York, NY, USA, pp. 259-266. https://doi.org/10.1145/2671188.2749291

Digital Object Identifier (DOI):

10.1145/2671188.2749291

Link:

Link to publication record in Edinburgh Research Explorer

Document Version:

Peer reviewed version

Published In:

ICMR '15 Proceedings of the 5th ACM on International Conference on Multimedia Retrieval

General rights

and / or other copyright owners and it is a condition of accessing these publications that users recognise and

abide by the legal requirements associated with these rights.

Take down policy

The University of Edinburgh has made every reasonable effort to ensure that Edinburgh Research Explorer

content complies with UK legislation. If you believe that the public display of this file breaches copyright please

contact openaccess@ed.ac.uk providing details, and we will remove access to the work immediately and

investigate your claim.

Download date: 10. Aug. 2022

Supervised Multi-scale Locality Sensitive Hashing

Li Weng

∗

LinkMedia group

Inria Rennes - Bretagne

Atlantique

35042 Rennes, France

I-Hong Jhuo

†

Institute of Information

Science

Academia Sinica

11529 Taipei, Taiwan

Miaojing Shi

Key Laboratory of Machine

Perception

Peking University

100871 Beijing, China

Meng Sun

‡

IIP Lab, PLA University of

Science and Technology

210007 Nanjing, China

Wen-Huang Cheng

MCLab, CITI

Academia Sinica

11529 Taipei, Taiwan

Laurent Amsaleg

IRISA Lab

CNRS Rennes

35042 Rennes, France

ABSTRACT

LSH is a popular framework to generate compact represen-

tations of multimedia data, which can be used for content

based search. However, the performance of LSH is limited by

its unsupervised nature and the underlying feature scale. In

this work, we propose to improve LSH by incorporating two

elements – supervised hash bit selection and multi-scale fea-

ture representation. First, a feature vector is represented by

multiple scales. At each scale, the feature vector is divided

into segments. The size of a segment is decreased gradually

to make the representation correspond to a coarse-to-ﬁne

view of the feature. Then each segment is hashed to gen-

erate more bits than the target hash length. Finally the

best ones are selected from the hash bit pool according to

the notion of bit reliability, which is estimated by bit-level

hypothesis testing.

Extensive experiments have been performed to validate

the proposal in two applications: near-duplicate image de-

tection and approximate feature distance estimation. We

ﬁrst demonstrate that the feature scale can inﬂuence perfor-

mance, which is often a neglected factor. Then we show that

the proposed supervision method is eﬀective. In particular,

the performance increases with the size of the hash bit pool.

Finally, the two elements are put together. The integrated

scheme exhibits further improved performance.

∗

L. Weng was supported by the French project Secular under

grant ANR-12-CORD-0014.

†

I-H. Jhuo is a co-ﬁrst author. He and W.-H. Cheng were

supported by the Ministry of Science and Technology of Tai-

wan under grant MOST-103-2911-I-001-531.

‡

M. Sun was supported by the National Natural Science

Foundation of China under grant 61402519 and the Nat-

ural Science Foundation of Jiangsu Province under grant

BK20140071.

Permission to make digital or hard copies of all or part of this work for personal or

classroom use is granted without fee provided that copies are not made or distributed

for profit or commercial advantage and that copies bear this notice and the full cita-

tion on the first page. Copyrights for components of this work owned by others than

ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or re-

publish, to post on servers or to redistribute to lists, requires prior specific permission

and/or a fee. Request permissions from permissions@acm.org.

ICMR’15, June 23– 26, 2015, Shanghai, China.



2015 ACM 978-1-4503-3274-3/15/06 ...$15.00.

http://dx.doi.org/10.1145/2671188.2749291.

Categories and Subject Descriptors

H.3.3 [Information Systems]: Information Search and Re-

trieval; I.4.7 [Computing Methodologies]: Image Pro-

cessing and Computer Vision—feature representation

General Terms

Algorithms, Design

Keywords

perceptual image hash, locality sensitive hashing, robust

representation, multiple scale, supervised feature selection

1. INTRODUCTION

Hash algorithms for multimedia data have recently re-

ceived much attention, because the compactness of hash val-

ues is the key for indexing and search in large-scale database

systems. A hash value is typically a short binary string,

whose length varies from tens to thousands of bits. It is a

compact digest of the input data to a hash algorithm. In or-

der to support content-based similarity search, multimedia

hash algorithms emerged in recent years. They are typically

designed to be robust, i.e., the hash value is independent of

the binary representation of a multimedia object. On the

other hand, they are also discriminative, i.e., diﬀerent con-

tent should have diﬀerent hash values.

In general, hashing techniques for multimedia data can di-

vide into two categories – perceptual hashing and semantic

hashing. They cover three applications – content classiﬁca-

tion, content identiﬁcation, and content authentication. Ex-

isting algorithms generally diﬀer in two aspects: 1) whether

particular features are required; 2) whether training is re-

quired. Perceptual hashing [9] mainly deals with the lat-

ter two applications. Corresponding algorithms are typi-

cally feature-dependent, and do not require training. Se-

mantic hashing [15], on the other hand, mainly addresses

content classiﬁcation. Corresponding algorithms are typi-

cally feature-independent and require training.

In this work, we focus on a class of feature-independent

hash algorithms, called locality-sensitive hashing (LSH) [1].

LSH is a generic framework originally used for approximate

nearest neighbor (ANN) search. An LSH scheme is a dis-

tribution on a family F of hash functions operating on a

259

collection of objects, such that for two objects x, y,

h∈F

[h(x)=h(y)] = sim(x, y), (1)

where sim(x, y) ∈ [0, 1] is some similarity function deﬁned

on the collection of objects, and Pr means probability. A

popular implementation of LSH is based on scalar quantiza-

tion [17]:

r,b

(v)=



r · v + b



, (2)

where · is the ﬂoor operation, v is a feature vector, r is a

random Gaussian vector, w is a quantization step, and b is a

random variable uniformly distributed between 0 and w.In

this work, our implementation of LSH is based on Charikar’s

work [3]:

(v)=



1ifv · r ≥ 0

0ifv · r<0

(3)

This implementation actually measures the angular similar-

ity between two feature vectors:

Pr[h

(u)=h

(v)] = 1 −

θ(u, v)

, (4)

where θ(u, v)=cos

−1

u·v

||u||·||v||

is the angle between u and

v. This representation is the foundation of random pro-

jection based hash algorithms. In order to approximately

quantize a feature vector, hyperplanes are randomly gener-

ated. The encoding depends on the relationship between

the hyperplanes and the feature vector. The essential diﬀer-

ence between LSH and later approaches lies in the way that

hyperplanes are generated. Instead of using random hyper-

planes, supervised algorithms try to search for hyperplanes

that are more suitable for the problem at hand.

1.1 Contribution

In this paper, we propose an extension of LSH, which we

call Supervised Multi-scale LSH (SMLSH). Two approaches

are explored – supervised hash bit selection and multi-scale

feature representation. Speciﬁcally, a feature vector is ﬁrst

represented by multiple scales; then each scale is hashed to

generate more bits than the target hash length; ﬁnally, the

best hash bits are selected from the candidate bit pool. This

extension can eﬀectively improve the performance of LSH in

various applications with the following desirable properties:

• Compatibility to existing LSH schemes;

• Asymptotically guaranteed eﬀectiveness;

• Scalability to large hash lengths.

The main advantage of the proposal is its versatility and

thus the potential to be applied to other feature-independent

hash algorithms. As an extension framework, we do not sig-

niﬁcantly modify an existing LSH scheme, so that conven-

tional systems can be easily upgraded.

The scalability in hash lengths is very important for large-

scale systems. According to the birthday paradox [18], one

may ﬁnd a pair of multimedia objects with the same n-bit

hash value (a collision) among 2

n/2

pairs. In practice, the

collision rate can be higher for multimedia hashing due to

the robustness requirement. Short hash lengths such as 32,

64 are more likely to cause false positives. In order to man-

age millions or billions of multimedia objects, a suﬃcient

hash length is critical in a system design. A large hash

length is also desirable for ANN applications where the con-

ventional recall@R setting is used.

Existing supervised hash algorithms typically use various

optimization techniques to compute hash bits. Due to the

curse of dimensionality, this approach is intrinsically diﬃ-

cult when the hash length exceeds a certain level. SMLSH

takes a diﬀerent approach. It inherits the virtues of both su-

pervised and randomized algorithms. As a randomized algo-

rithm, SMLSH can easily extend to arbitrary hash lengths.

As a weakly supervised approach, SMLSH does not greedily

search for the best hyperplanes in order to be eﬃcient and

avoid over-ﬁtting. Consequently, it improves performance

with aﬀordable complexity.

Multi-scale feature representation, to the best of our knowl-

edge, is an unexplored approach in multimedia hashing. Ex-

isting algorithms typically assume a certain feature scale,

which potentially limits performance. SMLSH unlocks this

limit by considering multiple scales simultaneously.

1.2 Related work

Perceptual hashing started from the late 90’s. Typical

work includes Schneider and Chang’s framework [16] and

Fridrich’s algorithm [5]. The latter is essentially a block-

based LSH variant. Afterwards various algorithms based on

diﬀerent features are proposed, such as RASH [11] based on

the Radon transform, Philips’ audio hashing algorithm [7]

based on the Mel-frequency cepstrum, the robust and se-

cure hash based on the Fourier-Mellin transform [20]. Other

features include higher-order statistics [25, 27], shapes [26],

DFT phases [28], DCT or DWT signs [23, 24, 22], etc.

Feature-independent hashing or semantic hashing started

from LSH. Typical work includes Charikar’s LSH [3] for co-

sine similarity and Datar et al.’s LSH [4] for L

distance.

Later, various approaches are proposed to adapt the algo-

rithm to the data and accommodate more semantics and

modalities. For example, unsupervised training is used in

spectral hashing [21], which is based on spectral clustering.

The kernel trick is used in the Kernelized LSH [10]. Re-

cently, supervised training is more widely used to overcome

the semantic gap, such as [15, 6, 12].

2. SUPERVISED MULTI-SCALE LSH

Our goal is to improve LSH. Without loss of generality,

the problem is deﬁned as follows:

• Given an LSH algorithm with n-bit output, build a

new algorithm with the same output length but im-

proved performance.

In order to be versatile, we do not modify the internal re-

alization of LSH. Since LSH can support arbitrary hash

lengths, our solution to the above problem is the following:

• Given an LSH algorithm, generate n

-bit output (n

≥

n), form a hash value with improved performance by

selecting n bits out of n

bits.

The question is then how to select the bits. The key idea of

SMLSH is that the choice of projections and features should

both adapt to the problem. Thus SMLSH consists of two

parts: multi-scale feature representation and hypothesis-

testing-driven bit selection. A schematic diagram is shown

in Fig. 1. The basic work-ﬂow is the following:

260

d/4

dimensions

d/2 dimensions

d dimensions

LSH

Multi-scale representation

Feature

vector of d

dimensions

Hash

bit

pool

…… (x scales)

Supervised

bit

selection

n-bit

output

scale

Input

Output

x·n’ bits

Figure 1: Schematic diagram of SMLSH.

1. The input feature vector is represented by x scales;

2. At each scale, the feature vector is fed into an LSH

algorithm to obtain n



≥ n) bits;

3. The best n bits are selected from all scales among the

= x · n



candidates.

In the following, the hash bit selection strategy and the

multi-scale feature representation are described in detail.

2.1 Hash bit selection

Intuitively, we need to select the “best” n bits from the

-bit output. We realize it according to the criterion of

bit reliability, a metric to measure the quality of each bit.

It can be obtained through a training procedure. Once the

bit reliability information is obtained, bit selection is just a

sorting procedure:

1. Estimate the reliability of all n

bits;

2. Sort the reliability of all n

bits;

3. Output the most reliable n bits.

The above description gives an overview of the proposed

scheme. Next, we deﬁne the bit reliability.

We consider an n-bit hash value as n binary classiﬁers,

each represented by a single bit. The bit reliability can

be evaluated by hypothesis testing. Denote the diﬀerence

between two hash values at position i as d

∈{0, 1} (i =

0, ··· ,n− 1). A decision is made from two hypotheses:

• H

– the images correspond to irrelevant content;

• H

– the images correspond to relevant content.

If d

=0,wechooseH

; otherwise we choose H

The reliability of a hash bit can be characterized by the

false positive rate p

and the false negative rate p

• p

= Probability {d

=0|H

} ;

• p

= Probability {d

=1|H

} .

Overall, we deﬁne the bit reliability as

= C

· p

+ C

· p

, (5)

where C

and C

are weight factors representing the cost

for diﬀerent mistakes. A smaller r

corresponds to better

reliability. This formulation is not biased by class skewness.

It has some similarity to the objective function in LDA-

Hash [19]. In the rest of the paper, we assume the weights

are equal to 1/2.

If we obtain some ground truth labels for training, the bit

reliability can be estimated. Thus we can improve an exist-

ing LSH scheme without modifying its internal realization.

2.2 Multi-scale feature representation

In practice, given a d-dimensional feature vector, an {l, k}

LSH scheme generates l sub-hash values, each with k bits.

The two parameters l and k are important - the former typ-

ically corresponds to the number of hash tables; the latter

is the size of a sub-hash value. The overall hash value con-

sists of l × k bits. An interesting property of LSH is that it

supports arbitrary hash lengths by varying l and k.

An often neglected factor in feature-independent hash al-

gorithms is the scale of the feature vector. In order to hash

(project) a feature vector, there are at least two ways: we

could either compute l × k bits from the whole feature vec-

tor, or divide it into l parts and compute k hash bits from

each part. Which approach is better?

This is similar to the question – whether we should use

global or local features? In general, global features have

good robustness but relatively weak discrimination, and lo-

cal features show the opposite. For a certain problem, one

cannot decide in advance which scale is the best. Therefore,

we propose to test features of diﬀerent scales and select the

suitable ones.

Assume we consider x scales (Fig. 1). For each scale index

= s

+ i (i =0, 1, ··· ,x− 1), we evenly divide the feature

vector into l =2

parts and compute k

≥ k)hashbits

from each part, so that n



= l · k

. The parameter s

(set to

0 by default) decides the starting scale. The parameter x is

determined in such a way that the minimum feature length

d/2

+x−1

is not too small. There are certainly other ways

to construct feature vectors of diﬀerent scales. We adopt our

approach mainly because of the implementation simplicity.

3. PERFORMANCE AND COMPLEXITY

When the Hamming distance is used for hash comparison,

two hash values are judged as relevant if their distance d

is less than t. The performance of a hash algorithm can

be characterized by the true positive rate P

and the false

positive rate P

• P

= Probability {d<t|H

} ;

• P

= Probability {d<t|H

} .

Assuming the n bits are independent and have average per-

formance {p

}, the performance of the overall scheme

can be formulated as:

= f(p

)(6)

= f(p

), (7)

where p

=1− p

and

f(p)=



k=n−t





· p

· (1 − p)

n−k

. (8)

The above equation was used in Condorcet’s jury theorem

to show that a decision is more likely to be correct with

more juries. In our proposal, the bit selection procedure

essentially increases p

and decreases p

by replacing the

original n bits with better candidates, i.e., we improve the

quality of juries. Given a pool of n

hyperplanes, the prob-

ability that our scheme fails is equal to the probability that

261

the original n bits are the best among the n

choices, which

is 1/





. This probability can be made arbitrarily small

by increasing n

. In practice, this property asymptotically

guarantees that our scheme is always eﬀective. For example,



256

128



is larger than 10

Assuming each coeﬃcient of a hyperplane is represented

by b bit precision, for a feature vector segment of length d/l,

there are totally 2

d/l·b

hyperplanes. That implies searching

for a hyperplane in a high-dimensional space is computa-

tionally diﬃcult. The training cost of greedy algorithms

becomes prohibitively high for large hash lengths.

The computational cost of SMLSH consists of training

cost and running cost. The training cost can be manually

controlled. When there are N samples (e.g. images) avail-

able, there are maximum





hash comparisons. We can

have enough ground truths even when the training set is

small. For example,



1000



is approximately half a million.

The running cost depends on the implementation. In the

worst case, when all the candidate hyperplanes are generated

online by a pseudo-random number generator and are all

used for projection (despite that not all results are used),

the cost is about x ·

times the cost of the original LSH. In

practice, the computation can be reduced by pre-computing

the selected hyperplanes oﬄine.

4. EXPERIMENT

Since LSH is a general technique in content based search,

we evaluate SMLSH in two diﬀerent applications:

• Case 1: Near duplicate image detection;

• Case 2: Approximate feature distance estimation.

The former is related to content and copyright management;

the latter is related to nearest neighbor search. The ﬁrst ap-

plication is a typical example with semantic gaps, i.e., rele-

vant items do not necessarily result in small distances. The

second application is an example of more ideal situations.

In the ﬁrst application, SMLSH is used for identifying

near-duplicate images. A near-duplicate is deﬁned as a quasi-

copy of an original multimedia object, typically resulted

by incidental noise. We use 100 images to generate the

training set and another 100 images to generate the test-

ing set. They are randomly selected from the validation set

of ILSVRC’2012.

Each set consists of 10, 600 images, in-

cluding the 100 original ones and 10, 500 near-duplicates.

The near-duplicates are created by applying a series of dis-

tortion to each of the 100 images. The list of distortion (15

categories, 7 levels each) is shown in Table 1. The relation-

ship between the original images and their near-duplicates

are used as the ground truth. GIST [14] feature vectors are

extracted from all these images.

In the second application, SMLSH is used for estimating

the similarity between SIFT vectors [13]. A dataset of ten

thousand SIFT vectors is used [8].

Half of it is used for

training and the other half is used for testing. The ground

truth is set as follows: two SIFT vectors are determined as

relevant if their cosine similarity is larger than 0.8.

Both datasets have been transformed by PCA in order

to remove the correlation between feature dimensions. In

particular, the GIST feature vectors are reduced to 256 di-

mensions. The SIFT vectors still keep 128 dimensions.

http://www.image-net.org/challenges/LSVRC/2012/

http://corpus-texmex.irisa.fr

Table 1: Distortions for near-duplicate generation.

Distortion name Parameter range, step

1. Rotation Angle: 1

◦

− 7

◦

2. Central cropping Percentage: 1% − 7%, 1%

3. Row removal

Percentage: 1% − 7%, 1%

4. Asymmetric cropping

Percentage: 1% − 7%, 1%

5. Circular shifting

Percentage: 1% − 7%, 1%

6. Down-scaling

Ratio: 0.7 − 0.1, 0.1

7. Shearing

Percentage: 1% − 7%, 1%

8. JPEG compression Quality factor: 70 − 10, 10

9. Median ﬁlter

Window size: 7 − 19, 2

10. Gaussian ﬁlter

Window size: 7 − 19, 2

11. Sharpening

Strength: 0.7 − 0.1, 0.1

12. Gaussian noise PSNR: 45 − 15 dB, 5 dB

13. Salt & pepper

Noise density: 0.01 − 0.07, 0.01

14. Gamma correction Gamma: 0.5 − 1.7, 0.2

15. Block tampering Block number: 1 − 7, 1

Parameters for the MATLAB function fspe-

cial(’unsharp’).

Parameters for the MATLAB function imnoise().

The size of a block is 1/64 of an image.

Table 2: Notations of SMLSH.

Notation Deﬁnition

n Hash length (bits)

x Number of scales

Scale index (i =0, ···x − 1)

k Initial sub-hash size (for s

)

l Initial number of feature segments (for s

)

4.1 Baselines and experiment setting

We mainly use the basic LSH algorithm deﬁned in (3)

as the baseline. Speciﬁcally, the ﬁrst scale is used without

supervision (l =1,k

= k). Another algorithm for per-

formance comparison is the recently proposed qoLSH [2] in

symmetric mode. It is only used in Fig. 3b and Fig. 5c for

Case 2 to generate 256-bit hash values, because it requires

the hash length to be larger than the number of feature di-

mensions while we have only tested hash lengths of 64, 128,

and 256 so far. The experiments investigate the relationship

between the performance and typically the following factors:

• The hash size (64, 128, 256);

• The size of the bit selection pool (200%, 400%);

• The number of available feature scales (1–3);

Hypothesis testing is used for evaluating SMLSH in both sce-

narios. The receiver operating characteristic (ROC) curves

are used for representing the performance. The two cases

take



10600



=56, 174, 700 and



5000



=12, 497, 500 pair-

wise comparisons respectively. We do not use a retrieval

setting because we focus on the hash performance only.

In the following, we ﬁrst evaluate the eﬀects of single scales

and supervision separately, then put them together. The

notations used in this work are summarized in Table 2.

4.2 Effect of single feature scales

Recall that diﬀerent feature scales may have diﬀerent im-

pacts on the performance. We consider the parameter set-

tings listed in Table 3. For the same hash length n, diﬀerent

{k, l} combinations are considered. In general, longer fea-

ture vectors are likely to enable more combinations.

The ROC curves are shown in Fig. 2 for the two scenarios.

Results indicate that the feature scale indeed matters. In

Case 1, when the false positive rate is small, the scale can

make a big diﬀerence. In Case 2, the scale eﬀect is not as

262

Supervised Multi-scale Locality Sensitive Hashing

Figures

Citations

Optimization of deep convolutional neural network for large scale image retrieval

Nonlinear Discrete Hashing

Deep learning hashing for mobile visual search

Privacy-Preserving Outsourced Media Search

Nonlinear Sparse Hashing

References

Distinctive Image Features from Scale-Invariant Keypoints

Modeling the Shape of the Scene: A Holistic Representation of the Spatial Envelope

Locality-sensitive hashing scheme based on p-stable distributions

Spectral Hashing

Cryptography and network security

Related Papers (5)

Iterative Quantization: A Procrustean Approach to Learning Binary Codes for Large-Scale Image Retrieval

Learning Multiple Layers of Features from Tiny Images

Large-Scale Unsupervised Hashing with Shared Structure Learning

Deep semantic-preserving and ranking-based hashing for image retrieval

Supervised hashing for image retrieval via image representation learning

Frequently Asked Questions (13)

Q1. What contributions have the authors mentioned in the paper "Supervised multi-scale locality sensitive hashing" ?

Q2. What is the notion of bit reliability used for estimating the quality of each bit?

Q3. What is the main advantage of the proposal?

Q4. What scale is the for a feature-independent algorithm?

Q5. How do the authors compute a feature vector?

Q6. What is the main idea of the work?

Q7. What is the way to solve the problem?

Q8. What is the reliability of a hash bit?

Q9. What are the two parameters that are important?

Q10. How much is the cost of a pseudo-random number generator?

Q11. What is the ROC curve for the two scenarios?

Q12. How many scales can be used to select a bit?

Q13. What is the ROC curve for a hash?