scispace - formally typeset
Open AccessPosted Content

Deep image retrieval: a survey

Reads0
Chats0
TLDR
In this article, the authors organize and review recent content-based image retrieval (CBIR) works that are developed based on deep learning algorithms and techniques, including insights and techniques from recent papers.
Abstract
In recent years a vast amount of visual content has been generated and shared from various fields, such as social media platforms, medical images, and robotics. This abundance of content creation and sharing has introduced new challenges. In particular, searching databases for similar content, i.e.content based image retrieval (CBIR), is a long-established research area, and more efficient and accurate methods are needed for real time retrieval. Artificial intelligence has made progress in CBIR and has significantly facilitated the process of intelligent search. In this survey we organize and review recent CBIR works that are developed based on deep learning algorithms and techniques, including insights and techniques from recent papers. We identify and present the commonly-used benchmarks and evaluation methods used in the field. We collect common challenges and propose promising future directions. More specifically, we focus on image retrieval with deep learning and organize the state of the art methods according to the types of deep network structure, deep features, feature enhancement methods, and network fine-tuning strategies. Our survey considers a wide variety of recent methods, aiming to promote a global view of the field of instance-based CBIR.

read more

Citations
More filters
Posted Content

DOLG: Single-Stage Image Retrieval with Deep Orthogonal Fusion of Local and Global Features

TL;DR: Zhang et al. as discussed by the authors proposed a Deep Orthogonal Local and Global (DOLG) information fusion framework for end-to-end image retrieval, which attentively extracts representative local information with multi-atrous convolutions and self-attention at first, then the orthogonal components are concatenated with the global image representation as a complementary, and then aggregation is performed to generate the final representation.
Posted Content

Telling the What while Pointing to the Where: Multimodal Queries for Image Retrieval

TL;DR: In this article, the authors propose an image retrieval setup with a new form of multimodal queries, where the user simultaneously uses both spoken natural language (the what) and mouse traces over an empty canvas (the where) to express the characteristics of the desired target image.
Journal ArticleDOI

Graph-based reasoning attention pooling with curriculum design for content-based image retrieval

TL;DR: A novel Graph-based Reasoning Attention Pooling with Curriculum Design (GRAP-CD) is proposed to improve the network capability through training modification and trainable pooling to achieve better local minima.
Journal ArticleDOI

Learning Efficient Hash Codes for Fast Graph-Based Data Similarity Retrieval

TL;DR: Wang et al. as discussed by the authors introduced an efficient hash model with graph neural networks (HGNN) for a newly designed task (i.e., fast graph-based data retrieval).
Posted Content

Differentially Private Supervised Manifold Learning with Applications like Private Image Retrieval.

TL;DR: In this paper, a differentially private method for supervised manifold learning is proposed, which can generate fine-tuned manifolds for a target use case such as content-based image retrieval.
References
More filters
Proceedings ArticleDOI

Deep Residual Learning for Image Recognition

TL;DR: In this article, the authors proposed a residual learning framework to ease the training of networks that are substantially deeper than those used previously, which won the 1st place on the ILSVRC 2015 classification task.
Proceedings Article

ImageNet Classification with Deep Convolutional Neural Networks

TL;DR: The state-of-the-art performance of CNNs was achieved by Deep Convolutional Neural Networks (DCNNs) as discussed by the authors, which consists of five convolutional layers, some of which are followed by max-pooling layers, and three fully-connected layers with a final 1000-way softmax.
Proceedings Article

Very Deep Convolutional Networks for Large-Scale Image Recognition

TL;DR: This work investigates the effect of the convolutional network depth on its accuracy in the large-scale image recognition setting using an architecture with very small convolution filters, which shows that a significant improvement on the prior-art configurations can be achieved by pushing the depth to 16-19 weight layers.
Journal ArticleDOI

Distinctive Image Features from Scale-Invariant Keypoints

TL;DR: This paper presents a method for extracting distinctive invariant features from images that can be used to perform reliable matching between different views of an object or scene and can robustly identify objects among clutter and occlusion while achieving near real-time performance.
Proceedings ArticleDOI

Going deeper with convolutions

TL;DR: Inception as mentioned in this paper is a deep convolutional neural network architecture that achieves the new state of the art for classification and detection in the ImageNet Large-Scale Visual Recognition Challenge 2014 (ILSVRC14).