Bi-Modal Content Based Image Retrieval using Multi-class Cycle-GAN

doi:10.1109/DICTA.2018.8615838

Home
/
Papers
/
Bi-Modal Content Based Image Retrieval using Multi-class Cycle-GAN

Proceedings Article•DOI•

Bi-Modal Content Based Image Retrieval using Multi-class Cycle-GAN

Girraj Pahariya¹•Institutions (1)

Indian Institute of Technology Madras¹

01 Dec 2018-pp 1-7

TL;DR: This work proposes a novel bimodal query based retrieval framework, which can take inputs from both sketch and image domains, and aims at reducing the domain gap by learning a mapping function using Generative Adversarial Networks and supervised deep domain adaptation techniques.

read less

Abstract: Content Based Image Retrieval (CBIR) systems retrieve relevant images from a database based on the content of the query. Most CBIR systems take a query image as input and retrieve similar images from a gallery, based on the global features (such as texture, shape, and color) extracted from an image. There are several ways of querying from an image database for retrieval purpose. Some of which are text, image, and sketch. However, the traditional methodologies support only one of the domains at a time. There is a need of bridging the gap between different domains (sketch and image) for enabling a Multi-Modal CBIR system. In this work, we propose a novel bimodal query based retrieval framework, which can take inputs from both sketch and image domains. The proposed framework aims at reducing the domain gap by learning a mapping function using Generative Adversarial Networks (GANs) and supervised deep domain adaptation techniques. Extensive experimentation and comparison with several baselines on two popular sketch datasets (Sketchy and TU-Berlin) show the effectiveness of our proposed framework.

...read moreread less

Citations

PDF

Open Access

More filters

Journal Article•

“Unpaired Image-to-Image Translation using Cycle-Consistent Adversarial Networks”の学習報告

[...]

杉山拓海

12 Sep 2017-Computers & Graphics

3,940 citations

Proceedings Article•DOI•

Sketch-based Image Retrieval: Effectivity Notion of Recent Approaches

[...]

Pavithra N¹, Sharath Kumar Y. H•Institutions (1)

Maharaja Institute of Technology, Coimbatore¹

01 Jun 2020

TL;DR: This manuscript contributes towards studying the effectiveness of the recently implemented work in SBIR and discusses about the research trend and highlights about the open end research issues in this field.

...read moreread less

Abstract: Retrieving image on the basis of the using sketch as the query image is termed as Sketch Based Image Retrieval (SBIR) which is increasing its pace of adoption motivated from nearly similar form of free hand drawing identification in the modern tablet, mobile devices, as other interactive computing devices. There are various work carried out in order to improve the process of SBIR especially from accuracy viewpoint. However, there are no significant discussions being carried out over the process effectiveness of recently implemented work towards this direction. This manuscript contributes towards studying the effectiveness of the recently implemented work in SBIR and discuss about the research trend and highlights about the open end research issues in this field. The study contributes towards framing up better decision process before implementing SBIR to the readers by highlighting the limitations as well as research trends towards SBIR.

...read moreread less

Cites background from "Bi-Modal Content Based Image Retrie..."

...[29] have presented a learning model with multi-tasking while Pahariya [30] have developed an image retrieval process on the basis of content where a bimodal query system is designed....
[...]

Patent•

An image recognition method based on domain conversion and a generation model

[...]

Han Xu, Hongjie Zhang, Guo Jie, Yanwen Guo

10 May 2019

TL;DR: In this paper, an image recognition method based on domain conversion and a generation model is presented. But the method is not suitable for the classification of images with the classification category of K + 1, where K is the number of categories.

...read moreread less

Abstract: The invention discloses an image recognition method based on domain conversion and a generation model. The method comprises the following steps: step 1, constructing a conversion model from a source domain to a target domain; 2, constructing a conversion model from a target domain to a source domain; Step 3, constructing a binary classification discrimination model; 4, constructing a generator; 5,constructing a classification model with the classification category of K + 1, wherein K is the number of categories; 6, obtaining a classification model based on the source domain and the target domain according to the steps 1, 2, 3, 4 and 5; and 7, obtaining a classification result of the to-be-classified image according to the model obtained in the step 6.

...read moreread less

References

PDF

Open Access

More filters

Proceedings Article•

ImageNet Classification with Deep Convolutional Neural Networks

[...]

Alex Krizhevsky¹, Ilya Sutskever¹, Geoffrey E. Hinton¹•Institutions (1)

University of Toronto¹

03 Dec 2012

TL;DR: The state-of-the-art performance of CNNs was achieved by Deep Convolutional Neural Networks (DCNNs) as discussed by the authors, which consists of five convolutional layers, some of which are followed by max-pooling layers, and three fully-connected layers with a final 1000-way softmax.

...read moreread less

Abstract: We trained a large, deep convolutional neural network to classify the 1.2 million high-resolution images in the ImageNet LSVRC-2010 contest into the 1000 different classes. On the test data, we achieved top-1 and top-5 error rates of 37.5% and 17.0% which is considerably better than the previous state-of-the-art. The neural network, which has 60 million parameters and 650,000 neurons, consists of five convolutional layers, some of which are followed by max-pooling layers, and three fully-connected layers with a final 1000-way softmax. To make training faster, we used non-saturating neurons and a very efficient GPU implementation of the convolution operation. To reduce overriding in the fully-connected layers we employed a recently-developed regularization method called "dropout" that proved to be very effective. We also entered a variant of this model in the ILSVRC-2012 competition and achieved a winning top-5 test error rate of 15.3%, compared to 26.2% achieved by the second-best entry.

...read moreread less

73,978 citations

Proceedings Article•

Very Deep Convolutional Networks for Large-Scale Image Recognition

[...]

Karen Simonyan¹, Andrew Zisserman¹•Institutions (1)

University of Oxford¹

04 Sep 2014

TL;DR: This work investigates the effect of the convolutional network depth on its accuracy in the large-scale image recognition setting using an architecture with very small convolution filters, which shows that a significant improvement on the prior-art configurations can be achieved by pushing the depth to 16-19 weight layers.

...read moreread less

Abstract: In this work we investigate the effect of the convolutional network depth on its accuracy in the large-scale image recognition setting. Our main contribution is a thorough evaluation of networks of increasing depth using an architecture with very small (3x3) convolution filters, which shows that a significant improvement on the prior-art configurations can be achieved by pushing the depth to 16-19 weight layers. These findings were the basis of our ImageNet Challenge 2014 submission, where our team secured the first and the second places in the localisation and classification tracks respectively. We also show that our representations generalise well to other datasets, where they achieve state-of-the-art results. We have made our two best-performing ConvNet models publicly available to facilitate further research on the use of deep visual representations in computer vision.

...read moreread less

55,235 citations

"Bi-Modal Content Based Image Retrie..." refers methods in this paper

...In both the ablation studies the similarity is carried out with the shared feature representation in first case and feature vector from the second last layer of VGG-Net [28] in the second case....
[...]
...• Ablation Study 2: Generated multi-class cycle GAN output is considered as perfect and classified using VGGNet [28] without using SDDA....
[...]

Proceedings Article•

Very Deep Convolutional Networks for Large-Scale Image Recognition

[...]

Karen Simonyan¹, Andrew Zisserman¹•Institutions (1)

University of Oxford¹

01 Jan 2015

TL;DR: In this paper, the authors investigated the effect of the convolutional network depth on its accuracy in the large-scale image recognition setting and showed that a significant improvement on the prior-art configurations can be achieved by pushing the depth to 16-19 layers.

...read moreread less

49,914 citations

Proceedings Article•DOI•

Going deeper with convolutions

[...]

Christian Szegedy¹, Wei Liu², Yangqing Jia¹, Pierre Sermanet¹, Scott Reed³, Dragomir Anguelov¹, Dumitru Erhan¹, Vincent Vanhoucke¹, Andrew Rabinovich - Show less +5 more•Institutions (3)

Google¹, University of North Carolina at Chapel Hill², University of Michigan³

07 Jun 2015

TL;DR: Inception as mentioned in this paper is a deep convolutional neural network architecture that achieves the new state of the art for classification and detection in the ImageNet Large-Scale Visual Recognition Challenge 2014 (ILSVRC14).

...read moreread less

Abstract: We propose a deep convolutional neural network architecture codenamed Inception that achieves the new state of the art for classification and detection in the ImageNet Large-Scale Visual Recognition Challenge 2014 (ILSVRC14). The main hallmark of this architecture is the improved utilization of the computing resources inside the network. By a carefully crafted design, we increased the depth and width of the network while keeping the computational budget constant. To optimize quality, the architectural decisions were based on the Hebbian principle and the intuition of multi-scale processing. One particular incarnation used in our submission for ILSVRC14 is called GoogLeNet, a 22 layers deep network, the quality of which is assessed in the context of classification and detection.

...read moreread less

40,257 citations

Journal Article•DOI•

Generative Adversarial Nets

[...]

Ian Goodfellow¹, Jean Pouget-Abadie¹, Mehdi Mirza¹, Bing Xu¹, David Warde-Farley¹, Sherjil Ozair², Aaron Courville¹, Yoshua Bengio¹ - Show less +4 more•Institutions (2)

Université de Montréal¹, Indian Institute of Technology Delhi²

08 Dec 2014

TL;DR: A new framework for estimating generative models via an adversarial process, in which two models are simultaneously train: a generative model G that captures the data distribution and a discriminative model D that estimates the probability that a sample came from the training data rather than G.

...read moreread less

Abstract: We propose a new framework for estimating generative models via an adversarial process, in which we simultaneously train two models: a generative model G that captures the data distribution, and a discriminative model D that estimates the probability that a sample came from the training data rather than G. The training procedure for G is to maximize the probability of D making a mistake. This framework corresponds to a minimax two-player game. In the space of arbitrary functions G and D, a unique solution exists, with G recovering the training data distribution and D equal to ½ everywhere. In the case where G and D are defined by multilayer perceptrons, the entire system can be trained with backpropagation. There is no need for any Markov chains or unrolled approximate inference networks during either training or generation of samples. Experiments demonstrate the potential of the framework through qualitative and quantitative evaluation of the generated samples.

...read moreread less

38,211 citations

"Bi-Modal Content Based Image Retrie..." refers background or methods in this paper

...Motivation for using M-cycle GAN over the traditional GAN [12] is derived from the advantage of transferring the sketch to images (because it learns to transfer both ways simultaneously)....
[...]
...pmodel(y = N+1|s) is used to represent synthetic output, corresponding to original discriminator in [12]....
[...]
...In this work, we use Generative Adversarial Networks (GAN’s) [10], [12] to learn embedding space for features corresponding to different domains....
[...]