Semantic Autoencoder for Zero-Shot Learning

doi:10.1109/CVPR.2017.473

Open AccessProceedings ArticleDOI

Semantic Autoencoder for Zero-Shot Learning

- pp 4447-4456

TLDR

In this paper, an encoder aims to project a visual feature vector into the semantic space as in the existing ZSL models, but the decoder exerts an additional constraint, that the projection/code must be able to reconstruct the original visual feature.

Abstract:

Existing zero-shot learning (ZSL) models typically learn a projection function from a feature space to a semantic embedding space (e.g. attribute space). However, such a projection function is only concerned with predicting the training seen class semantic representation (e.g. attribute prediction) or classification. When applied to test data, which in the context of ZSL contains different (unseen) classes without training data, a ZSL model typically suffers from the project domain shift problem. In this work, we present a novel solution to ZSL based on learning a Semantic AutoEncoder (SAE). Taking the encoder-decoder paradigm, an encoder aims to project a visual feature vector into the semantic space as in the existing ZSL models. However, the decoder exerts an additional constraint, that is, the projection/code must be able to reconstruct the original visual feature. We show that with this additional reconstruction constraint, the learned projection function from the seen classes is able to generalise better to the new unseen classes. Importantly, the encoder and decoder are linear and symmetric which enable us to develop an extremely efficient learning algorithm. Extensive experiments on six benchmark datasets demonstrate that the proposed SAE outperforms significantly the existing ZSL models with the additional benefit of lower computational cost. Furthermore, when the SAE is applied to supervised clustering problem, it also beats the state-of-the-art.

Citations

PDF

Open Access

More filters

Proceedings ArticleDOI

Learning to Compare: Relation Network for Few-Shot Learning

Flood Sung, +5 more

TL;DR: A conceptually simple, flexible, and general framework for few-shot learning, where a classifier must learn to recognise new classes given only few examples from each, which is easily extended to zero- shot learning.

...read moreread less

Posted Content

Zero-Shot Learning - A Comprehensive Evaluation of the Good, the Bad and the Ugly

Yongqin Xian, +3 more

- 03 Jul 2017 -

arXiv: Computer Vision and Pattern Recog...

TL;DR: A new zero-shot learning dataset is proposed, the Animals with Attributes 2 (AWA2) dataset which is made publicly available both in terms of image features and the images themselves and compares and analyzes a significant number of the state-of-the-art methods in depth.

...read moreread less

Journal ArticleDOI

Zero-Shot Learning—A Comprehensive Evaluation of the Good, the Bad and the Ugly

Yongqin Xian, +3 more

- 01 Sep 2019 -

IEEE Transactions on Pattern Analysis an...

TL;DR: The Animals with Attributes 2 (AWA2) dataset as mentioned in this paper is a new dataset for zero-shot learning, which is publicly available both in terms of image features and the images themselves.

...read moreread less

Proceedings ArticleDOI

Zero-Shot Recognition via Semantic Embeddings and Knowledge Graphs

Xiaolong Wang, +2 more

TL;DR: In this article, a graph convolutional network (GCN) is used to predict the visual classifiers of unseen categories, which is robust to noise in the learned knowledge graph (KG) given a semantic embedding for each node (representing visual category).

...read moreread less

Proceedings ArticleDOI

Generalized Zero-Shot Learning via Synthesized Examples

Vinay Kumar Verma, +3 more

TL;DR: This work presents a generative framework for generalized zero-shot learning where the training and test classes are not necessarily disjoint, and can generate novel exemplars from seen/unseen classes, given their respective class attributes.

...read moreread less

Collapse

References

PDF

Open Access

More filters

Proceedings Article

ImageNet Classification with Deep Convolutional Neural Networks

Alex Krizhevsky, +2 more

TL;DR: The state-of-the-art performance of CNNs was achieved by Deep Convolutional Neural Networks (DCNNs) as discussed by the authors, which consists of five convolutional layers, some of which are followed by max-pooling layers, and three fully-connected layers with a final 1000-way softmax.

...read moreread less

Proceedings Article

Very Deep Convolutional Networks for Large-Scale Image Recognition

Karen Simonyan, +1 more

TL;DR: This work investigates the effect of the convolutional network depth on its accuracy in the large-scale image recognition setting using an architecture with very small convolution filters, which shows that a significant improvement on the prior-art configurations can be achieved by pushing the depth to 16-19 weight layers.

...read moreread less

Proceedings ArticleDOI

ImageNet: A large-scale hierarchical image database

Jia Deng, +5 more

TL;DR: A new database called “ImageNet” is introduced, a large-scale ontology of images built upon the backbone of the WordNet structure, much larger in scale and diversity and much more accurate than the current image datasets.

...read moreread less

Proceedings ArticleDOI

Going deeper with convolutions

Christian Szegedy, +8 more

TL;DR: Inception as mentioned in this paper is a deep convolutional neural network architecture that achieves the new state of the art for classification and detection in the ImageNet Large-Scale Visual Recognition Challenge 2014 (ILSVRC14).

...read moreread less

Journal ArticleDOI

ImageNet Large Scale Visual Recognition Challenge

Olga Russakovsky, +11 more

- 01 Dec 2015 -

International Journal of Computer Vision

TL;DR: The ImageNet Large Scale Visual Recognition Challenge (ILSVRC) as mentioned in this paper is a benchmark in object category classification and detection on hundreds of object categories and millions of images, which has been run annually from 2010 to present, attracting participation from more than fifty institutions.

...read moreread less

Collapse

IEEE Transactions on Pattern Analysis an...

DeViSE: A Deep Visual-Semantic Embedding Model

Andrea Frome, +6 more

Describing objects by their attributes

Ali Farhadi, +3 more

Semantic Autoencoder for Zero-Shot Learning

Citations

Learning to Compare: Relation Network for Few-Shot Learning

Zero-Shot Learning - A Comprehensive Evaluation of the Good, the Bad and the Ugly

Zero-Shot Learning—A Comprehensive Evaluation of the Good, the Bad and the Ugly

Zero-Shot Recognition via Semantic Embeddings and Knowledge Graphs

Generalized Zero-Shot Learning via Synthesized Examples

References

ImageNet Classification with Deep Convolutional Neural Networks

Very Deep Convolutional Networks for Large-Scale Image Recognition

ImageNet: A large-scale hierarchical image database

Going deeper with convolutions

ImageNet Large Scale Visual Recognition Challenge

Related Papers (5)

An embarrassingly simple approach to zero-shot learning

Deep Residual Learning for Image Recognition

Attribute-Based Classification for Zero-Shot Visual Object Categorization

DeViSE: A Deep Visual-Semantic Embedding Model

Describing objects by their attributes