scispace - formally typeset
Open AccessPosted Content

Disentangling Factors of Variation via Generative Entangling

TLDR
This work proposes a novel model family based on the spike-and-slab restricted Boltzmann machine which is generalize to include higher-order interactions among multiple latent variables and applies it to the task of facial expression classification.
Abstract
Here we propose a novel model family with the objective of learning to disentangle the factors of variation in data. Our approach is based on the spike-and-slab restricted Boltzmann machine which we generalize to include higher-order interactions among multiple latent variables. Seen from a generative perspective, the multiplicative interactions emulates the entangling of factors of variation. Inference in the model can be seen as disentangling these generative factors. Unlike previous attempts at disentangling latent factors, the proposed model is trained using no supervised information regarding the latent factors. We apply our model to the task of facial expression classification.

read more

Citations
More filters
Proceedings Article

beta-VAE: Learning Basic Visual Concepts with a Constrained Variational Framework

TL;DR: In this article, a modification of the variational autoencoder (VAE) framework is proposed to learn interpretable factorised latent representations from raw image data in a completely unsupervised manner.
Posted Content

InfoGAN: Interpretable Representation Learning by Information Maximizing Generative Adversarial Nets

TL;DR: InfoGAN as mentioned in this paper is a generative adversarial network that maximizes the mutual information between a small subset of the latent variables and the observation, which can be interpreted as a variation of the Wake-Sleep algorithm.
Proceedings Article

InfoGAN: interpretable representation learning by information maximizing generative adversarial nets

TL;DR: InfoGAN as mentioned in this paper is an information-theoretic extension to the GAN that is able to learn disentangled representations in a completely unsupervised manner, and it also discovers visual concepts that include hair styles, presence of eyeglasses, and emotions on the CelebA face dataset.
Posted Content

A Style-Based Generator Architecture for Generative Adversarial Networks

TL;DR: This article proposed an alternative generator architecture for GANs, borrowing from style transfer literature, which leads to an automatically learned, unsupervised separation of high-level attributes (e.g., pose and identity when trained on human faces) and stochastic variation in the generated images.
Proceedings Article

Challenging Common Assumptions in the Unsupervised Learning of Disentangled Representations

TL;DR: The authors show that the unsupervised learning of disentangled representations is fundamentally impossible without inductive biases on both the models and the data, and suggest that future work on disentanglement learning should be explicit about the role of inductive bias and (implicit) supervision.
References
More filters
Proceedings ArticleDOI

A unified architecture for natural language processing: deep neural networks with multitask learning

TL;DR: This work describes a single convolutional neural network architecture that, given a sentence, outputs a host of language processing predictions: part-of-speech tags, chunks, named entity tags, semantic roles, semantically similar words and the likelihood that the sentence makes sense using a language model.
Proceedings Article

Deep Boltzmann machines

TL;DR: A new learning algorithm for Boltzmann machines that contain many layers of hidden variables that is made more efficient by using a layer-by-layer “pre-training” phase that allows variational inference to be initialized with a single bottomup pass.
Proceedings Article

An analysis of single-layer networks in unsupervised feature learning

TL;DR: In this paper, the authors show that the number of hidden nodes in the model may be more important to achieving high performance than the learning algorithm or the depth of the model, and they apply several othe-shelf feature learning algorithms (sparse auto-encoders, sparse RBMs, K-means clustering, and Gaussian mixtures) to CIFAR, NORB, and STL datasets using only single-layer networks.
Proceedings ArticleDOI

Evaluation of local spatio-temporal features for action recognition

TL;DR: It is demonstrated that regular sampling of space-time features consistently outperforms all testedspace-time interest point detectors for human actions in realistic settings and is a consistent ranking for the majority of methods over different datasets.
Book ChapterDOI

Transforming auto-encoders

TL;DR: It is argued that neural networks can be used to learn features that output a whole vector of instantiation parameters and this is a much more promising way of dealing with variations in position, orientation, scale and lighting than the methods currently employed in the neural networks community.
Related Papers (5)