Aggregating local descriptors into a compact image representation
Herve Jegou,Matthijs Douze,Cordelia Schmid,Patrick Pérez +3 more
- pp 3304-3311
TLDR
This work proposes a simple yet efficient way of aggregating local image descriptors into a vector of limited dimension, which can be viewed as a simplification of the Fisher kernel representation, and shows how to jointly optimize the dimension reduction and the indexing algorithm.Abstract:
We address the problem of image search on a very large scale, where three constraints have to be considered jointly: the accuracy of the search, its efficiency, and the memory usage of the representation. We first propose a simple yet efficient way of aggregating local image descriptors into a vector of limited dimension, which can be viewed as a simplification of the Fisher kernel representation. We then show how to jointly optimize the dimension reduction and the indexing algorithm, so that it best preserves the quality of vector comparison. The evaluation shows that our approach significantly outperforms the state of the art: the search accuracy is comparable to the bag-of-features approach for an image representation that fits in 20 bytes. Searching a 10 million image dataset takes about 50ms.read more
Citations
More filters
Proceedings ArticleDOI
Going deeper with convolutions
Christian Szegedy,Wei Liu,Yangqing Jia,Pierre Sermanet,Scott Reed,Dragomir Anguelov,Dumitru Erhan,Vincent Vanhoucke,Andrew Rabinovich +8 more
TL;DR: Inception as mentioned in this paper is a deep convolutional neural network architecture that achieves the new state of the art for classification and detection in the ImageNet Large-Scale Visual Recognition Challenge 2014 (ILSVRC14).
Posted Content
Group Normalization
Yuxin Wu,Kaiming He +1 more
TL;DR: Group Normalization can outperform its BN-based counterparts for object detection and segmentation in COCO, and for video classification in Kinetics, showing that GN can effectively replace the powerful BN in a variety of tasks.
Proceedings ArticleDOI
NetVLAD: CNN Architecture for Weakly Supervised Place Recognition
TL;DR: A convolutional neural network architecture that is trainable in an end-to-end manner directly for the place recognition task and an efficient training procedure which can be applied on very large-scale weakly labelled tasks are developed.
Journal ArticleDOI
Iterative Quantization: A Procrustean Approach to Learning Binary Codes for Large-Scale Image Retrieval
TL;DR: This paper addresses the problem of learning similarity-preserving binary codes for efficient similarity search in large-scale image collections by proposing a simple and efficient alternating minimization algorithm, dubbed iterative quantization (ITQ), and demonstrating an application of ITQ to learning binary attributes or "classemes" on the ImageNet data set.
Journal ArticleDOI
Aggregating Local Image Descriptors into Compact Codes
TL;DR: This paper first presents and evaluates different ways of aggregating local image descriptors into a vector and shows that the Fisher kernel achieves better performance than the reference bag-of-visual words approach for any given vector dimension.
References
More filters
Journal ArticleDOI
Distinctive Image Features from Scale-Invariant Keypoints
TL;DR: This paper presents a method for extracting distinctive invariant features from images that can be used to perform reliable matching between different views of an object or scene and can robustly identify objects among clutter and occlusion while achieving near real-time performance.
Book
Pattern Recognition and Machine Learning
TL;DR: Probability Distributions, linear models for Regression, Linear Models for Classification, Neural Networks, Graphical Models, Mixture Models and EM, Sampling Methods, Continuous Latent Variables, Sequential Data are studied.
Journal ArticleDOI
Pattern Recognition and Machine Learning
TL;DR: This book covers a broad range of topics for regular factorial designs and presents all of the material in very mathematical fashion and will surely become an invaluable resource for researchers and graduate students doing research in the design of factorial experiments.
Journal ArticleDOI
A performance evaluation of local descriptors
TL;DR: It is observed that the ranking of the descriptors is mostly independent of the interest region detector and that the SIFT-based descriptors perform best and Moments and steerable filters show the best performance among the low dimensional descriptors.
Proceedings ArticleDOI
Video Google: a text retrieval approach to object matching in videos
TL;DR: An approach to object and scene retrieval which searches for and localizes all the occurrences of a user outlined object in a video, represented by a set of viewpoint invariant region descriptors so that recognition can proceed successfully despite changes in viewpoint, illumination and partial occlusion.