Iterative Quantization: A Procrustean Approach to Learning Binary Codes for Large-Scale Image Retrieval

doi:10.1109/TPAMI.2012.193

Home
/
Papers
/
Iterative Quantization: A Procrustean Approach to Learning Binary Codes for Large-Scale Image Retrieval

Journal Article•DOI•

Iterative Quantization: A Procrustean Approach to Learning Binary Codes for Large-Scale Image Retrieval

Yunchao Gong¹, Svetlana Lazebnik², Albert Gordo³, Florent Perronnin⁴•Institutions (4)

University of North Carolina at Chapel Hill¹, University of Illinois at Urbana–Champaign², Autonomous University of Barcelona³, Xerox⁴

01 Dec 2013-IEEE Transactions on Pattern Analysis and Machine Intelligence (IEEE Computer Society)-Vol. 35, Iss: 12, pp 2916-2929

TL;DR: This paper addresses the problem of learning similarity-preserving binary codes for efficient similarity search in large-scale image collections by proposing a simple and efficient alternating minimization algorithm, dubbed iterative quantization (ITQ), and demonstrating an application of ITQ to learning binary attributes or "classemes" on the ImageNet data set.

read less

Abstract: This paper addresses the problem of learning similarity-preserving binary codes for efficient similarity search in large-scale image collections. We formulate this problem in terms of finding a rotation of zero-centered data so as to minimize the quantization error of mapping this data to the vertices of a zero-centered binary hypercube, and propose a simple and efficient alternating minimization algorithm to accomplish this task. This algorithm, dubbed iterative quantization (ITQ), has connections to multiclass spectral clustering and to the orthogonal Procrustes problem, and it can be used both with unsupervised data embeddings such as PCA and supervised embeddings such as canonical correlation analysis (CCA). The resulting binary codes significantly outperform several other state-of-the-art methods. We also show that further performance improvements can result from transforming the data with a nonlinear kernel mapping prior to PCA or CCA. Finally, we demonstrate an application of ITQ to learning binary attributes or "classemes" on the ImageNet data set.

...read moreread less

Citations

PDF

Open Access

More filters

Posted Content•

Compressing Deep Convolutional Networks using Vector Quantization

[...]

Yunchao Gong, Liu Liu, Ming Yang, Lubomir Bourdev

18 Dec 2014-arXiv: Computer Vision and Pattern Recognition

TL;DR: This paper is able to achieve 16-24 times compression of the network with only 1% loss of classification accuracy using the state-of-the-art CNN, and finds in terms of compressing the most storage demanding dense connected layers, vector quantization methods have a clear gain over existing matrix factorization methods.

...read moreread less

Abstract: Deep convolutional neural networks (CNN) has become the most promising method for object recognition, repeatedly demonstrating record breaking results for image classification and object detection in recent years. However, a very deep CNN generally involves many layers with millions of parameters, making the storage of the network model to be extremely large. This prohibits the usage of deep CNNs on resource limited hardware, especially cell phones or other embedded devices. In this paper, we tackle this model storage issue by investigating information theoretical vector quantization methods for compressing the parameters of CNNs. In particular, we have found in terms of compressing the most storage demanding dense connected layers, vector quantization methods have a clear gain over existing matrix factorization methods. Simply applying k-means clustering to the weights or conducting product quantization can lead to a very good balance between model size and recognition accuracy. For the 1000-category classification task in the ImageNet challenge, we are able to achieve 16-24 times compression of the network with only 1% loss of classification accuracy using the state-of-the-art CNN.

...read moreread less

1,139 citations

Cites background or methods from "Iterative Quantization: A Procruste..."

...Many learning-based binarization or product quantization methods are available, such as Spectral Hashing (Weiss et al., 2008), Iterative Quantization (Gong et al., 2012), and Catesian kmeans (Norouzi & Fleet, 2013), among others....
[...]
..., 2008), Iterative Quantization (Gong et al., 2012), and Catesian kmeans (Norouzi & Fleet, 2013), among others....
[...]

Proceedings Article•DOI•

Deep Hashing Network for Unsupervised Domain Adaptation

[...]

Hemanth Venkateswara¹, Jose Eusebio¹, Shayok Chakraborty¹, Sethuraman Panchanathan•Institutions (1)

Arizona State University¹

01 Jul 2017

TL;DR: In this article, the authors proposed a novel deep learning framework that can exploit labeled source data and unlabeled target data to learn informative hash codes, to accurately classify unseen target data.

...read moreread less

Abstract: In recent years, deep neural networks have emerged as a dominant machine learning tool for a wide variety of application domains. However, training a deep neural network requires a large amount of labeled data, which is an expensive process in terms of time, labor and human expertise. Domain adaptation or transfer learning algorithms address this challenge by leveraging labeled data in a different, but related source domain, to develop a model for the target domain. Further, the explosive growth of digital data has posed a fundamental challenge concerning its storage and retrieval. Due to its storage and retrieval efficiency, recent years have witnessed a wide application of hashing in a variety of computer vision applications. In this paper, we first introduce a new dataset, Office-Home, to evaluate domain adaptation algorithms. The dataset contains images of a variety of everyday objects from multiple domains. We then propose a novel deep learning framework that can exploit labeled source data and unlabeled target data to learn informative hash codes, to accurately classify unseen target data. To the best of our knowledge, this is the first research effort to exploit the feature learning capabilities of deep neural networks to learn representative hash codes to address the domain adaptation problem. Our extensive empirical studies on multiple transfer tasks corroborate the usefulness of the framework in learning efficient hash codes which outperform existing competitive baselines for unsupervised domain adaptation.

...read moreread less

984 citations

Proceedings Article•DOI•

Supervised Discrete Hashing

[...]

Fumin Shen¹, Chunhua Shen², Wei Liu³, Heng Tao Shen⁴•Institutions (4)

University of Electronic Science and Technology of China¹, University of Adelaide², IBM³, University of Queensland⁴

07 Jun 2015

TL;DR: This work proposes a new supervised hashing framework, where the learning objective is to generate the optimal binary hash codes for linear classification, and introduces an auxiliary variable to reformulate the objective such that it can be solved substantially efficiently by employing a regularization algorithm.

...read moreread less

Abstract: Recently, learning based hashing techniques have attracted broad research interests because they can support efficient storage and retrieval for high-dimensional data such as images, videos, documents, etc. However, a major difficulty of learning to hash lies in handling the discrete constraints imposed on the pursued hash codes, which typically makes hash optimizations very challenging (NP-hard in general). In this work, we propose a new supervised hashing framework, where the learning objective is to generate the optimal binary hash codes for linear classification. By introducing an auxiliary variable, we reformulate the objective such that it can be solved substantially efficiently by employing a regularization algorithm. One of the key steps in this algorithm is to solve a regularization sub-problem associated with the NP-hard binary optimization. We show that the sub-problem admits an analytical solution via cyclic coordinate descent. As such, a high-quality discrete solution can eventually be obtained in an efficient computing manner, therefore enabling to tackle massive datasets. We evaluate the proposed approach, dubbed Supervised Discrete Hashing (SDH), on four large image datasets and demonstrate its superiority to the state-of-the-art hashing methods in large-scale image retrieval.

...read moreread less

923 citations

Cites background or methods from "Iterative Quantization: A Procruste..."

...The proposed method is compared against several state-of-the-art supervised hashing methods including BRE [13], MLH [24], SSH [34], CCA-ITQ [10], KSH [20], FastHash [17] and unsupervised methods including PCA-ITQ [10], AGH [21] and IMH [29] with t-SNE [33]....
[...]
...Iterative Quantization (ITQ) [10] is an effective approach to decrease the quantization error by applying an orthogonal rotation to projected training data....
[...]
...The representative algorithms in this category include unsupervised PCA Hashing [34], Iterative Quantization (ITQ) [10], Isotropic Hashing [12], etc....
[...]
...As an extension of linear hash functions, a variety of algorithms have been proposed to generate nonlinear hash functions in a kernel space, including Binary Reconstructive Embedding (BRE) [13], Random Maximum Margin Hashing (RMMH) [11], Kernel-Based Supervised Hashing (KSH) [20], the kernel variant of ITQ [10], etc....
[...]
...Hashing has attracted considerable attention of researchers in computer vision, machine learning, information retrieval and related areas [8,10,16,20,21,31,34,38,40]....
[...]

Proceedings Article•DOI•

Discriminative Learning of Deep Convolutional Feature Point Descriptors

[...]

Edgar Simo-Serra¹, Eduard Trulls², Luis Ferraz, Iasonas Kokkinos³, Pascal Fua², Francesc Moreno-Noguer⁴ - Show less +2 more•Institutions (4)

Waseda University¹, École Normale Supérieure², CentraleSupélec³, Polytechnic University of Catalonia⁴

07 Dec 2015

TL;DR: This paper uses Convolutional Neural Networks to learn discriminant patch representations and in particular train a Siamese network with pairs of (non-)corresponding patches to develop 128-D descriptors whose euclidean distances reflect patch similarity and can be used as a drop-in replacement for any task involving SIFT.

...read moreread less

Abstract: Deep learning has revolutionalized image-level tasks such as classification, but patch-level tasks, such as correspondence, still rely on hand-crafted features, e.g. SIFT. In this paper we use Convolutional Neural Networks (CNNs) to learn discriminant patch representations and in particular train a Siamese network with pairs of (non-)corresponding patches. We deal with the large number of potential pairs with the combination of a stochastic sampling of the training set and an aggressive mining strategy biased towards patches that are hard to classify. By using the L2 distance during both training and testing we develop 128-D descriptors whose euclidean distances reflect patch similarity, and which can be used as a drop-in replacement for any task involving SIFT. We demonstrate consistent performance gains over the state of the art, and generalize well against scaling and rotation, perspective transformation, non-rigid deformation, and illumination changes. Our descriptors are efficient to compute and amenable to modern GPUs, and are publicly available.

...read moreread less

848 citations

Cites methods from "Iterative Quantization: A Procruste..."

...This line of works includes unsupervised techniques based on hashing as well as supervised approaches using Linear Discriminant Analysis [3, 9, 24], boosting [29], and convex optimization [23]....
[...]

Journal Article•DOI•

A Survey on Learning to Hash

[...]

Jingdong Wang¹, Ting Zhang², Jingkuan Song³, Nicu Sebe⁴, Heng Tao Shen³ - Show less +1 more•Institutions (4)

Microsoft¹, University of Science and Technology of China², University of Electronic Science and Technology of China³, University of Trento⁴

01 Apr 2018-IEEE Transactions on Pattern Analysis and Machine Intelligence

TL;DR: In this paper, a comprehensive survey of the learning to hash algorithms is presented, categorizing them according to the manners of preserving the similarities into: pairwise similarity preserving, multi-wise similarity preservation, implicit similarity preserving and quantization, and discuss their relations.

...read moreread less

Abstract: Nearest neighbor search is a problem of finding the data points from the database such that the distances from them to the query point are the smallest. Learning to hash is one of the major solutions to this problem and has been widely studied recently. In this paper, we present a comprehensive survey of the learning to hash algorithms, categorize them according to the manners of preserving the similarities into: pairwise similarity preserving, multiwise similarity preserving, implicit similarity preserving, as well as quantization, and discuss their relations. We separate quantization from pairwise similarity preserving as the objective function is very different though quantization, as we show, can be derived from preserving the pairwise similarities. In addition, we present the evaluation protocols, and the general performance analysis, and point out that the quantization algorithms perform superiorly in terms of search accuracy, search time cost, and space cost. Finally, we introduce a few emerging topics.

...read moreread less

838 citations

1
2
3
4
…
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200

Collapse

References

PDF

Open Access

More filters

Proceedings Article•DOI•

ImageNet: A large-scale hierarchical image database

[...]

Jia Deng¹, Wei Dong¹, Richard Socher¹, Li-Jia Li¹, Kai Li¹, Li Fei-Fei¹ - Show less +2 more•Institutions (1)

Princeton University¹

20 Jun 2009

TL;DR: A new database called “ImageNet” is introduced, a large-scale ontology of images built upon the backbone of the WordNet structure, much larger in scale and diversity and much more accurate than the current image datasets.

...read moreread less

Abstract: The explosion of image data on the Internet has the potential to foster more sophisticated and robust models and algorithms to index, retrieve, organize and interact with images and multimedia data. But exactly how such data can be harnessed and organized remains a critical problem. We introduce here a new database called “ImageNet”, a large-scale ontology of images built upon the backbone of the WordNet structure. ImageNet aims to populate the majority of the 80,000 synsets of WordNet with an average of 500-1000 clean and full resolution images. This will result in tens of millions of annotated images organized by the semantic hierarchy of WordNet. This paper offers a detailed analysis of ImageNet in its current state: 12 subtrees with 5247 synsets and 3.2 million images in total. We show that ImageNet is much larger in scale and diversity and much more accurate than the current image datasets. Constructing such a large-scale database is a challenging task. We describe the data collection scheme with Amazon Mechanical Turk. Lastly, we illustrate the usefulness of ImageNet through three simple applications in object recognition, image classification and automatic object clustering. We hope that the scale, accuracy, diversity and hierarchical structure of ImageNet can offer unparalleled opportunities to researchers in the computer vision community and beyond.

...read moreread less

49,639 citations

"Iterative Quantization: A Procruste..." refers methods in this paper

...Section 4 describes the supervised version of our method based on CCA....
[...]

Journal Article•DOI•

Distinctive Image Features from Scale-Invariant Keypoints

[...]

David G. Lowe¹•Institutions (1)

University of British Columbia¹

01 Nov 2004-International Journal of Computer Vision

TL;DR: This paper presents a method for extracting distinctive invariant features from images that can be used to perform reliable matching between different views of an object or scene and can robustly identify objects among clutter and occlusion while achieving near real-time performance.

...read moreread less

Abstract: This paper presents a method for extracting distinctive invariant features from images that can be used to perform reliable matching between different views of an object or scene. The features are invariant to image scale and rotation, and are shown to provide robust matching across a substantial range of affine distortion, change in 3D viewpoint, addition of noise, and change in illumination. The features are highly distinctive, in the sense that a single feature can be correctly matched with high probability against a large database of features from many images. This paper also describes an approach to using these features for object recognition. The recognition proceeds by matching individual features to a database of features from known objects using a fast nearest-neighbor algorithm, followed by a Hough transform to identify clusters belonging to a single object, and finally performing verification through least-squares solution for consistent pose parameters. This approach to recognition can robustly identify objects among clutter and occlusion while achieving near real-time performance.

...read moreread less

46,906 citations

"Iterative Quantization: A Procruste..." refers methods in this paper

...SIFT [26] and color descriptors [35] are extracted from 24 24 patches every 6 pixels at 5 scales (by resizing the image by a factor ffiffiffi 2 p...
[...]

Dissertation•

Learning Multiple Layers of Features from Tiny Images

[...]

Alex Krizhevsky¹•Institutions (1)

University of Toronto¹

01 Jan 2009

TL;DR: In this paper, the authors describe how to train a multi-layer generative model of natural images, using a dataset of millions of tiny colour images, described in the next section.

...read moreread less

Abstract: In this work we describe how to train a multi-layer generative model of natural images. We use a dataset of millions of tiny colour images, described in the next section. This has been attempted by several groups but without success. The models on which we focus are RBMs (Restricted Boltzmann Machines) and DBNs (Deep Belief Networks). These models learn interesting-looking filters, which we show are more useful to a classifier than the raw pixels. We train the classifier on a labeled subset that we have collected and call the CIFAR-10 dataset.

...read moreread less

15,005 citations

Journal Article•

LIBLINEAR: A Library for Large Linear Classification

[...]

Rong-En Fan¹, Kai-Wei Chang¹, Cho-Jui Hsieh¹, Xiang-Rui Wang¹, Chih-Jen Lin¹ - Show less +1 more•Institutions (1)

National Taiwan University¹

01 Jun 2008-Journal of Machine Learning Research

TL;DR: LIBLINEAR is an open source library for large-scale linear classification that supports logistic regression and linear support vector machines and provides easy-to-use command-line tools and library calls for users and developers.

...read moreread less

Abstract: LIBLINEAR is an open source library for large-scale linear classification. It supports logistic regression and linear support vector machines. We provide easy-to-use command-line tools and library calls for users and developers. Comprehensive documents are available for both beginners and advanced users. Experiments demonstrate that LIBLINEAR is very efficient on large sparse data sets.

...read moreread less

7,848 citations

"Iterative Quantization: A Procruste..." refers methods in this paper

...To learn the classemes, we randomly pick 950 classes from ILSVRC2010 and train LIBLINEAR SVM classifiers [9] on them....
[...]

Journal Article•DOI•

Modeling the Shape of the Scene: A Holistic Representation of the Spatial Envelope

[...]

Aude Oliva¹, Antonio Torralba²•Institutions (2)

Brigham and Women's Hospital¹, Carleton College²

01 May 2001-International Journal of Computer Vision

TL;DR: The performance of the spatial envelope model shows that specific information about object shape or identity is not a requirement for scene categorization and that modeling a holistic representation of the scene informs about its probable semantic category.

...read moreread less

Abstract: In this paper, we propose a computational model of the recognition of real world scenes that bypasses the segmentation and the processing of individual objects or regions. The procedure is based on a very low dimensional representation of the scene, that we term the Spatial Envelope. We propose a set of perceptual dimensions (naturalness, openness, roughness, expansion, ruggedness) that represent the dominant spatial structure of a scene. Then, we show that these dimensions may be reliably estimated using spectral and coarsely localized information. The model generates a multidimensional space in which scenes sharing membership in semantic categories (e.g., streets, highways, coasts) are projected closed together. The performance of the spatial envelope model shows that specific information about object shape or identity is not a requirement for scene categorization and that modeling a holistic representation of the scene informs about its probable semantic category.

...read moreread less

6,882 citations

"Iterative Quantization: A Procruste..." refers methods in this paper

...We represent them with grayscale GIST descriptors [11] computed at 8 orientations and 4 different scales, resulting in 320dimensional feature vectors....
[...]