scispace - formally typeset
Search or ask a question
Proceedings ArticleDOI

Bi-Modal Content Based Image Retrieval using Multi-class Cycle-GAN

01 Dec 2018-pp 1-7
TL;DR: This work proposes a novel bimodal query based retrieval framework, which can take inputs from both sketch and image domains, and aims at reducing the domain gap by learning a mapping function using Generative Adversarial Networks and supervised deep domain adaptation techniques.
Abstract: Content Based Image Retrieval (CBIR) systems retrieve relevant images from a database based on the content of the query. Most CBIR systems take a query image as input and retrieve similar images from a gallery, based on the global features (such as texture, shape, and color) extracted from an image. There are several ways of querying from an image database for retrieval purpose. Some of which are text, image, and sketch. However, the traditional methodologies support only one of the domains at a time. There is a need of bridging the gap between different domains (sketch and image) for enabling a Multi-Modal CBIR system. In this work, we propose a novel bimodal query based retrieval framework, which can take inputs from both sketch and image domains. The proposed framework aims at reducing the domain gap by learning a mapping function using Generative Adversarial Networks (GANs) and supervised deep domain adaptation techniques. Extensive experimentation and comparison with several baselines on two popular sketch datasets (Sketchy and TU-Berlin) show the effectiveness of our proposed framework.
Citations
More filters
Proceedings ArticleDOI
01 Jun 2020
TL;DR: This manuscript contributes towards studying the effectiveness of the recently implemented work in SBIR and discusses about the research trend and highlights about the open end research issues in this field.
Abstract: Retrieving image on the basis of the using sketch as the query image is termed as Sketch Based Image Retrieval (SBIR) which is increasing its pace of adoption motivated from nearly similar form of free hand drawing identification in the modern tablet, mobile devices, as other interactive computing devices. There are various work carried out in order to improve the process of SBIR especially from accuracy viewpoint. However, there are no significant discussions being carried out over the process effectiveness of recently implemented work towards this direction. This manuscript contributes towards studying the effectiveness of the recently implemented work in SBIR and discuss about the research trend and highlights about the open end research issues in this field. The study contributes towards framing up better decision process before implementing SBIR to the readers by highlighting the limitations as well as research trends towards SBIR.

Cites background from "Bi-Modal Content Based Image Retrie..."

  • ...[29] have presented a learning model with multi-tasking while Pahariya [30] have developed an image retrieval process on the basis of content where a bimodal query system is designed....

    [...]

Patent
10 May 2019
TL;DR: In this paper, an image recognition method based on domain conversion and a generation model is presented. But the method is not suitable for the classification of images with the classification category of K + 1, where K is the number of categories.
Abstract: The invention discloses an image recognition method based on domain conversion and a generation model. The method comprises the following steps: step 1, constructing a conversion model from a source domain to a target domain; 2, constructing a conversion model from a target domain to a source domain; Step 3, constructing a binary classification discrimination model; 4, constructing a generator; 5,constructing a classification model with the classification category of K + 1, wherein K is the number of categories; 6, obtaining a classification model based on the source domain and the target domain according to the steps 1, 2, 3, 4 and 5; and 7, obtaining a classification result of the to-be-classified image according to the model obtained in the step 6.
References
More filters
Proceedings Article
03 Dec 2012
TL;DR: The state-of-the-art performance of CNNs was achieved by Deep Convolutional Neural Networks (DCNNs) as discussed by the authors, which consists of five convolutional layers, some of which are followed by max-pooling layers, and three fully-connected layers with a final 1000-way softmax.
Abstract: We trained a large, deep convolutional neural network to classify the 1.2 million high-resolution images in the ImageNet LSVRC-2010 contest into the 1000 different classes. On the test data, we achieved top-1 and top-5 error rates of 37.5% and 17.0% which is considerably better than the previous state-of-the-art. The neural network, which has 60 million parameters and 650,000 neurons, consists of five convolutional layers, some of which are followed by max-pooling layers, and three fully-connected layers with a final 1000-way softmax. To make training faster, we used non-saturating neurons and a very efficient GPU implementation of the convolution operation. To reduce overriding in the fully-connected layers we employed a recently-developed regularization method called "dropout" that proved to be very effective. We also entered a variant of this model in the ILSVRC-2012 competition and achieved a winning top-5 test error rate of 15.3%, compared to 26.2% achieved by the second-best entry.

73,978 citations

Proceedings Article
04 Sep 2014
TL;DR: This work investigates the effect of the convolutional network depth on its accuracy in the large-scale image recognition setting using an architecture with very small convolution filters, which shows that a significant improvement on the prior-art configurations can be achieved by pushing the depth to 16-19 weight layers.
Abstract: In this work we investigate the effect of the convolutional network depth on its accuracy in the large-scale image recognition setting. Our main contribution is a thorough evaluation of networks of increasing depth using an architecture with very small (3x3) convolution filters, which shows that a significant improvement on the prior-art configurations can be achieved by pushing the depth to 16-19 weight layers. These findings were the basis of our ImageNet Challenge 2014 submission, where our team secured the first and the second places in the localisation and classification tracks respectively. We also show that our representations generalise well to other datasets, where they achieve state-of-the-art results. We have made our two best-performing ConvNet models publicly available to facilitate further research on the use of deep visual representations in computer vision.

55,235 citations


"Bi-Modal Content Based Image Retrie..." refers methods in this paper

  • ...In both the ablation studies the similarity is carried out with the shared feature representation in first case and feature vector from the second last layer of VGG-Net [28] in the second case....

    [...]

  • ...• Ablation Study 2: Generated multi-class cycle GAN output is considered as perfect and classified using VGGNet [28] without using SDDA....

    [...]

Proceedings Article
01 Jan 2015
TL;DR: In this paper, the authors investigated the effect of the convolutional network depth on its accuracy in the large-scale image recognition setting and showed that a significant improvement on the prior-art configurations can be achieved by pushing the depth to 16-19 layers.
Abstract: In this work we investigate the effect of the convolutional network depth on its accuracy in the large-scale image recognition setting. Our main contribution is a thorough evaluation of networks of increasing depth using an architecture with very small (3x3) convolution filters, which shows that a significant improvement on the prior-art configurations can be achieved by pushing the depth to 16-19 weight layers. These findings were the basis of our ImageNet Challenge 2014 submission, where our team secured the first and the second places in the localisation and classification tracks respectively. We also show that our representations generalise well to other datasets, where they achieve state-of-the-art results. We have made our two best-performing ConvNet models publicly available to facilitate further research on the use of deep visual representations in computer vision.

49,914 citations

Proceedings ArticleDOI
07 Jun 2015
TL;DR: Inception as mentioned in this paper is a deep convolutional neural network architecture that achieves the new state of the art for classification and detection in the ImageNet Large-Scale Visual Recognition Challenge 2014 (ILSVRC14).
Abstract: We propose a deep convolutional neural network architecture codenamed Inception that achieves the new state of the art for classification and detection in the ImageNet Large-Scale Visual Recognition Challenge 2014 (ILSVRC14). The main hallmark of this architecture is the improved utilization of the computing resources inside the network. By a carefully crafted design, we increased the depth and width of the network while keeping the computational budget constant. To optimize quality, the architectural decisions were based on the Hebbian principle and the intuition of multi-scale processing. One particular incarnation used in our submission for ILSVRC14 is called GoogLeNet, a 22 layers deep network, the quality of which is assessed in the context of classification and detection.

40,257 citations

Journal ArticleDOI
08 Dec 2014
TL;DR: A new framework for estimating generative models via an adversarial process, in which two models are simultaneously train: a generative model G that captures the data distribution and a discriminative model D that estimates the probability that a sample came from the training data rather than G.
Abstract: We propose a new framework for estimating generative models via an adversarial process, in which we simultaneously train two models: a generative model G that captures the data distribution, and a discriminative model D that estimates the probability that a sample came from the training data rather than G. The training procedure for G is to maximize the probability of D making a mistake. This framework corresponds to a minimax two-player game. In the space of arbitrary functions G and D, a unique solution exists, with G recovering the training data distribution and D equal to ½ everywhere. In the case where G and D are defined by multilayer perceptrons, the entire system can be trained with backpropagation. There is no need for any Markov chains or unrolled approximate inference networks during either training or generation of samples. Experiments demonstrate the potential of the framework through qualitative and quantitative evaluation of the generated samples.

38,211 citations


"Bi-Modal Content Based Image Retrie..." refers background or methods in this paper

  • ...Motivation for using M-cycle GAN over the traditional GAN [12] is derived from the advantage of transferring the sketch to images (because it learns to transfer both ways simultaneously)....

    [...]

  • ...pmodel(y = N+1|s) is used to represent synthetic output, corresponding to original discriminator in [12]....

    [...]

  • ...In this work, we use Generative Adversarial Networks (GAN’s) [10], [12] to learn embedding space for features corresponding to different domains....

    [...]