Showing papers on "Autoencoder published in 2016"

PDF

Open Access

Proceedings Article•DOI•

Generating Sentences from a Continuous Space

[...]

Samuel R. Bowman¹, Luke Vilnis², Oriol Vinyals, Andrew M. Dai, Rafal Jozefowicz, Samy Bengio - Show less +2 more•Institutions (2)

Stanford University¹, University of Massachusetts Amherst²

01 Jan 2016

TL;DR: This work introduces and study an RNN-based variational autoencoder generative model that incorporates distributed latent representations of entire sentences that allows it to explicitly model holistic properties of sentences such as style, topic, and high-level syntactic features.

...read moreread less

Abstract: The standard recurrent neural network language model (RNNLM) generates sentences one word at a time and does not work from an explicit global sentence representation. In this work, we introduce and study an RNN-based variational autoencoder generative model that incorporates distributed latent representations of entire sentences. This factorization allows it to explicitly model holistic properties of sentences such as style, topic, and high-level syntactic features. Samples from the prior over these sentence representations remarkably produce diverse and well-formed sentences through simple deterministic decoding. By examining paths through this latent space, we are able to generate coherent novel sentences that interpolate between known sentences. We present techniques for solving the difficult learning problem presented by this model, demonstrate its effectiveness in imputing missing words, explore many interesting properties of the model's latent sentence space, and present negative results on the use of the model in language modeling.

...read moreread less

1,690 citations

Proceedings Article•

Autoencoding beyond pixels using a learned similarity metric

[...]

Anders Boesen Lindbo Larsen¹, Søren Kaae Sønderby², Hugo Larochelle³, Ole Winther²•Institutions (3)

Technical University of Denmark¹, University of Copenhagen², Twitter³

19 Jun 2016

TL;DR: In this article, an autoencoder that leverages learned representations to better measure similarities in data space is presented, which can use learned feature representations in the GAN discriminator as basis for the VAE reconstruction objective.

...read moreread less

Abstract: We present an autoencoder that leverages learned representations to better measure similarities in data space. By combining a variational autoencoder (VAE) with a generative adversarial network (GAN) we can use learned feature representations in the GAN discriminator as basis for the VAE reconstruction objective. Thereby, we replace element-wise errors with feature-wise errors to better capture the data distribution while offering invariance towards e.g. translation. We apply our method to images of faces and show that it outperforms VAEs with element-wise similarity measures in terms of visual fidelity. Moreover, we show that the method learns an embedding in which high-level abstract visual features (e.g. wearing glasses) can be modified using simple arithmetic.

...read moreread less

1,683 citations

Proceedings Article•

Conditional image generation with PixelCNN decoders

[...]

Aaron van den Oord¹, Nal Kalchbrenner¹, Oriol Vinyals¹, Lasse Espeholt¹, Alex Graves¹, Koray Kavukcuoglu¹ - Show less +2 more•Institutions (1)

Google¹

05 Dec 2016

TL;DR: The gated convolutional layers in the proposed model improve the log-likelihood of PixelCNN to match the state-of-the-art performance of PixelRNN on ImageNet, with greatly reduced computational cost.

...read moreread less

Abstract: This work explores conditional image generation with a new image density model based on the PixelCNN architecture. The model can be conditioned on any vector, including descriptive labels or tags, or latent embeddings created by other networks. When conditioned on class labels from the ImageNet database, the model is able to generate diverse, realistic scenes representing distinct animals, objects, landscapes and structures. When conditioned on an embedding produced by a convolutional network given a single image of an unseen face, it generates a variety of new portraits of the same person with different facial expressions, poses and lighting conditions. We also show that conditional PixelCNN can serve as a powerful decoder in an image autoencoder. Additionally, the gated convolutional layers in the proposed model improve the log-likelihood of PixelCNN to match the state-of-the-art performance of PixelRNN on ImageNet, with greatly reduced computational cost.

...read moreread less

1,275 citations

Posted Content•

Conditional Image Generation with PixelCNN Decoders

[...]

Aaron van den Oord¹, Nal Kalchbrenner¹, Oriol Vinyals¹, Lasse Espeholt¹, Alex Graves¹, Koray Kavukcuoglu¹ - Show less +2 more•Institutions (1)

Google¹

16 Jun 2016-arXiv: Computer Vision and Pattern Recognition

TL;DR: In this paper, a new image density model based on the PixelCNN architecture is proposed for conditional image generation, which can be conditioned on any vector, including descriptive labels or tags, or latent embeddings created by other networks.

...read moreread less

1,259 citations

Book Chapter•DOI•

Unsupervised CNN for Single View Depth Estimation: Geometry to the Rescue

[...]

Ravi Garg¹, B. G. Vijay Kumar¹, Gustavo Carneiro¹, Ian Reid¹•Institutions (1)

University of Adelaide¹

08 Oct 2016

TL;DR: This work proposes a unsupervised framework to learn a deep convolutional neural network for single view depth prediction, without requiring a pre-training stage or annotated ground-truth depths, and shows that this network trained on less than half of the KITTI dataset gives comparable performance to that of the state-of-the-art supervised methods for singleView depth estimation.

...read moreread less

Abstract: A significant weakness of most current deep Convolutional Neural Networks is the need to train them using vast amounts of manually labelled data. In this work we propose a unsupervised framework to learn a deep convolutional neural network for single view depth prediction, without requiring a pre-training stage or annotated ground-truth depths. We achieve this by training the network in a manner analogous to an autoencoder. At training time we consider a pair of images, source and target, with small, known camera motion between the two such as a stereo pair. We train the convolutional encoder for the task of predicting the depth map for the source image. To do so, we explicitly generate an inverse warp of the target image using the predicted depth and known inter-view displacement, to reconstruct the source image; the photometric error in the reconstruction is the reconstruction loss for the encoder. The acquisition of this training data is considerably simpler than for equivalent systems, requiring no manual annotation, nor calibration of depth sensor to camera. We show that our network trained on less than half of the KITTI dataset gives comparable performance to that of the state-of-the-art supervised methods for single view depth estimation.

...read moreread less

1,238 citations

Journal Article•DOI•

Extreme Learning Machine for Multilayer Perceptron

[...]

Jiexiong Tang¹, Chenwei Deng¹, Guang-Bin Huang²•Institutions (2)

Beijing Institute of Technology¹, Nanyang Technological University²

01 Apr 2016-IEEE Transactions on Neural Networks

TL;DR: Extensive experiments on various widely used classification data sets show that the proposed algorithm achieves better and faster convergence than the existing state-of-the-art hierarchical learning methods, and multiple applications in computer vision further confirm the generality and capability of the proposed learning scheme.

...read moreread less

Abstract: Extreme learning machine (ELM) is an emerging learning algorithm for the generalized single hidden layer feedforward neural networks, of which the hidden node parameters are randomly generated and the output weights are analytically computed. However, due to its shallow architecture, feature learning using ELM may not be effective for natural signals (e.g., images/videos), even with a large number of hidden nodes. To address this issue, in this paper, a new ELM-based hierarchical learning framework is proposed for multilayer perceptron. The proposed architecture is divided into two main components: 1) self-taught feature extraction followed by supervised feature classification and 2) they are bridged by random initialized hidden weights. The novelties of this paper are as follows: 1) unsupervised multilayer encoding is conducted for feature extraction, and an ELM-based sparse autoencoder is developed via $\ell _{1}$ constraint. By doing so, it achieves more compact and meaningful feature representations than the original ELM; 2) by exploiting the advantages of ELM random feature mapping, the hierarchically encoded outputs are randomly projected before final decision making, which leads to a better generalization with faster learning speed; and 3) unlike the greedy layerwise training of deep learning (DL), the hidden layers of the proposed framework are trained in a forward manner. Once the previous layer is established, the weights of the current layer are fixed without fine-tuning. Therefore, it has much better learning efficiency than the DL. Extensive experiments on various widely used classification data sets show that the proposed algorithm achieves better and faster convergence than the existing state-of-the-art hierarchical learning methods. Furthermore, multiple applications in computer vision further confirm the generality and capability of the proposed learning scheme.

...read moreread less

1,166 citations

Proceedings Article•

Improved Variational Inference with Inverse Autoregressive Flow

[...]

Durk P. Kingma, Tim Salimans¹, Rafal Jozefowicz², Xi Chen³, Ilya Sutskever², Max Welling⁴ - Show less +2 more•Institutions (4)

OpenAI¹, Google², University of California, Berkeley³, University of Amsterdam⁴

01 Jan 2016

TL;DR: A new type of normalizing flow, inverse autoregressive flow (IAF), is proposed that, in contrast to earlier published flows, scales well to high-dimensional latent spaces and significantly improves upon diagonal Gaussian approximate posteriors.

...read moreread less

Abstract: The framework of normalizing flows provides a general strategy for flexible variational inference of posteriors over latent variables. We propose a new type of normalizing flow, inverse autoregressive flow (IAF), that, in contrast to earlier published flows, scales well to high-dimensional latent spaces. The proposed flow consists of a chain of invertible transformations, where each transformation is based on an autoregressive neural network. In experiments, we show that IAF significantly improves upon diagonal Gaussian approximate posteriors. In addition, we demonstrate that a novel type of variational autoencoder, coupled with IAF, is competitive with neural autoregressive models in terms of attained log-likelihood on natural images, while allowing significantly faster synthesis.

...read moreread less

901 citations

Journal Article•DOI•

High-dimensional and large-scale anomaly detection using a linear one-class SVM with deep learning

[...]

Sarah M. Erfani¹, Sutharshan Rajasegarar¹, Shanika Karunasekera¹, Christopher Leckie¹•Institutions (1)

University of Melbourne¹

01 Oct 2016-Pattern Recognition

TL;DR: A hybrid model where an unsupervised DBN is trained to extract generic underlying features, and a one-class SVM is trained from the features learned by the DBN, which delivers a comparable accuracy with a deep autoencoder and is scalable and computationally efficient.

...read moreread less

876 citations

Posted Content•

Unsupervised CNN for Single View Depth Estimation: Geometry to the Rescue

[...]

Ravi Garg, Vijay Kumar Bg, Gustavo Carneiro, Ian Reid

16 Mar 2016-arXiv: Computer Vision and Pattern Recognition

TL;DR: In this paper, an unsupervised framework was proposed to learn a deep CNN for single view depth prediction without requiring a pre-training stage or annotated ground truth depths, by training the network in a manner analogous to an autoencoder.

...read moreread less

Abstract: A significant weakness of most current deep Convolutional Neural Networks is the need to train them using vast amounts of manu- ally labelled data. In this work we propose a unsupervised framework to learn a deep convolutional neural network for single view depth predic- tion, without requiring a pre-training stage or annotated ground truth depths. We achieve this by training the network in a manner analogous to an autoencoder. At training time we consider a pair of images, source and target, with small, known camera motion between the two such as a stereo pair. We train the convolutional encoder for the task of predicting the depth map for the source image. To do so, we explicitly generate an inverse warp of the target image using the predicted depth and known inter-view displacement, to reconstruct the source image; the photomet- ric error in the reconstruction is the reconstruction loss for the encoder. The acquisition of this training data is considerably simpler than for equivalent systems, requiring no manual annotation, nor calibration of depth sensor to camera. We show that our network trained on less than half of the KITTI dataset (without any further augmentation) gives com- parable performance to that of the state of art supervised methods for single view depth estimation.

...read moreread less

830 citations

Proceedings Article•DOI•

Learning Temporal Regularity in Video Sequences

[...]

Mahmudul Hasan¹, Jonghyun Choi, Jan Neumann, Amit K. Roy-Chowdhury¹, Larry S. Davis² - Show less +1 more•Institutions (2)

University of California, Riverside¹, University of Maryland, College Park²

01 Jun 2016

TL;DR: In this article, a generative model for regular motion patterns (termed as regularity) using multiple sources with very limited supervision is proposed, and two methods are built upon the autoencoders for their ability to work with little to no supervision.

...read moreread less

Abstract: Perceiving meaningful activities in a long video sequence is a challenging problem due to ambiguous definition of 'meaningfulness' as well as clutters in the scene. We approach this problem by learning a generative model for regular motion patterns (termed as regularity) using multiple sources with very limited supervision. Specifically, we propose two methods that are built upon the autoencoders for their ability to work with little to no supervision. We first leverage the conventional handcrafted spatio-temporal local features and learn a fully connected autoencoder on them. Second, we build a fully convolutional feed-forward autoencoder to learn both the local features and the classifiers as an end-to-end learning framework. Our model can capture the regularities from multiple datasets. We evaluate our methods in both qualitative and quantitative ways - showing the learned regularity of videos in various aspects and demonstrating competitive performance on anomaly detection datasets as an application.

...read moreread less

769 citations

Proceedings Article•

Generating Images with Perceptual Similarity Metrics based on Deep Networks

[...]

Alexey Dosovitskiy¹, Thomas Brox¹•Institutions (1)

University of Freiburg¹

08 Feb 2016

TL;DR: A class of loss functions, which are called deep perceptual similarity metrics (DeePSiM), are proposed that compute distances between image features extracted by deep neural networks and better reflects perceptually similarity of images and thus leads to better results.

...read moreread less

Abstract: We propose a class of loss functions, which we call deep perceptual similarity metrics (DeePSiM), allowing to generate sharp high resolution images from compressed abstract representations. Instead of computing distances in the image space, we compute distances between image features extracted by deep neural networks. This metric reflects perceptual similarity of images much better and, thus, leads to better results. We demonstrate two examples of use cases of the proposed loss: (1) networks that invert the AlexNet convolutional network; (2) a modified version of a variational autoencoder that generates realistic high-resolution random images.

...read moreread less

Journal Article•DOI•

Stacked Sparse Autoencoder (SSAE) for Nuclei Detection on Breast Cancer Histopathology Images

[...]

Jun Xu¹, Lei Xiang¹, Qingshan Liu¹, Hannah Gilmore², Jianzhong Wu, Jinghai Tang, Anant Madabhushi² - Show less +3 more•Institutions (2)

Nanjing University of Information Science and Technology¹, Case Western Reserve University²

01 Jan 2016-IEEE Transactions on Medical Imaging

TL;DR: A Stacked Sparse Autoencoder, an instance of a deep learning strategy, is presented for efficient nuclei detection on high-resolution histopathological images of breast cancer and out-performed nine other state of the art nuclear detection strategies.

...read moreread less

Abstract: Automated nuclear detection is a critical step for a number of computer assisted pathology related image analysis algorithms such as for automated grading of breast cancer tissue specimens. The Nottingham Histologic Score system is highly correlated with the shape and appearance of breast cancer nuclei in histopathological images. However, automated nucleus detection is complicated by 1) the large number of nuclei and the size of high resolution digitized pathology images, and 2) the variability in size, shape, appearance, and texture of the individual nuclei. Recently there has been interest in the application of “Deep Learning” strategies for classification and analysis of big image data. Histopathology, given its size and complexity, represents an excellent use case for application of deep learning strategies. In this paper, a Stacked Sparse Autoencoder (SSAE), an instance of a deep learning strategy, is presented for efficient nuclei detection on high-resolution histopathological images of breast cancer. The SSAE learns high-level features from just pixel intensities alone in order to identify distinguishing features of nuclei. A sliding window operation is applied to each image in order to represent image patches via high-level features obtained via the auto-encoder, which are then subsequently fed to a classifier which categorizes each image patch as nuclear or non-nuclear. Across a cohort of 500 histopathological images (2200 $\times$ 2200) and approximately 3500 manually segmented individual nuclei serving as the groundtruth, SSAE was shown to have an improved F-measure 84.49% and an average area under Precision-Recall curve (AveP) 78.83%. The SSAE approach also out-performed nine other state of the art nuclear detection strategies.

...read moreread less

Posted Content•

Generating Images with Perceptual Similarity Metrics based on Deep Networks

[...]

Alexey Dosovitskiy¹, Thomas Brox¹•Institutions (1)

University of Freiburg¹

08 Feb 2016-arXiv: Learning

TL;DR: Deep perceptual similarity metrics (DeePSiM) as mentioned in this paper is proposed to mitigate the over-smoothed results of image-generating machine learning models by computing distances between image features extracted by deep neural networks.

...read moreread less

Abstract: Image-generating machine learning models are typically trained with loss functions based on distance in the image space. This often leads to over-smoothed results. We propose a class of loss functions, which we call deep perceptual similarity metrics (DeePSiM), that mitigate this problem. Instead of computing distances in the image space, we compute distances between image features extracted by deep neural networks. This metric better reflects perceptually similarity of images and thus leads to better results. We show three applications: autoencoder training, a modification of a variational autoencoder, and inversion of deep convolutional networks. In all cases, the generated images look sharp and resemble natural images.

...read moreread less

Journal Article•DOI•

Auto-encoder based dimensionality reduction

[...]

Yasi Wang¹, Hongxun Yao¹, Sicheng Zhao¹•Institutions (1)

Harbin Institute of Technology¹

05 Apr 2016-Neurocomputing

TL;DR: The results show that auto-encoder can indeed learn something different from other methods, and its possible relation with the intrinsic dimensionality of input data.

...read moreread less

Journal Article•DOI•

A sparse auto-encoder-based deep neural network approach for induction motor faults classification

[...]

Wenjun Sun¹, Siyu Shao¹, Rui Zhao², Ruqiang Yan¹, Ruqiang Yan³, Xingwu Zhang³, Xuefeng Chen³ - Show less +3 more•Institutions (3)

Southeast University¹, Nanyang Technological University², Xi'an Jiaotong University³

01 Jul 2016-Measurement

TL;DR: Compared with traditional neural network, the SAE-based DNN can achieve superior performance for feature learning and classification in the field of induction motor fault diagnosis.

...read moreread less

Proceedings Article•

Importance Weighted Autoencoders

[...]

Yuri Burda¹, Roger Grosse¹, Ruslan Salakhutdinov¹•Institutions (1)

University of Toronto¹

01 Jan 2016

TL;DR: The importance weighted autoencoder (IWAE) as mentioned in this paper uses a strictly tighter log-likelihood lower bound derived from importance weighting to model complex posteriors which do not fit the VAE modeling assumptions.

...read moreread less

Abstract: The variational autoencoder (VAE; Kingma, Welling (2014)) is a recently proposed generative model pairing a top-down generative network with a bottom-up recognition network which approximates posterior inference. It typically makes strong assumptions about posterior inference, for instance that the posterior distribution is approximately factorial, and that its parameters can be approximated with nonlinear regression from the observations. As we show empirically, the VAE objective can lead to overly simplified representations which fail to use the network's entire modeling capacity. We present the importance weighted autoencoder (IWAE), a generative model with the same architecture as the VAE, but which uses a strictly tighter log-likelihood lower bound derived from importance weighting. In the IWAE, the recognition network uses multiple samples to approximate the posterior, giving it increased flexibility to model complex posteriors which do not fit the VAE modeling assumptions. We show empirically that IWAEs learn richer latent space representations than VAEs, leading to improved test log-likelihood on density estimation benchmarks.

...read moreread less

Journal Article•DOI•

A deep learning framework for character motion synthesis and editing

[...]

Daniel Holden¹, Jun Saito, Taku Komura¹•Institutions (1)

University of Edinburgh¹

11 Jul 2016

TL;DR: A framework to synthesize character movements based on high level parameters, such that the produced movements respect the manifold of human motion, trained on a large motion capture dataset, can produce smooth, high quality motion sequences without any manual pre-processing of the training data.

...read moreread less

Abstract: We present a framework to synthesize character movements based on high level parameters, such that the produced movements respect the manifold of human motion, trained on a large motion capture dataset. The learned motion manifold, which is represented by the hidden units of a convolutional autoencoder, represents motion data in sparse components which can be combined to produce a wide range of complex movements. To map from high level parameters to the motion manifold, we stack a deep feedforward neural network on top of the trained autoencoder. This network is trained to produce realistic motion sequences from parameters such as a curve over the terrain that the character should follow, or a target location for punching and kicking. The feedforward control network and the motion manifold are trained independently, allowing the user to easily switch between feedforward networks according to the desired interface, without re-training the motion manifold. Once motion is generated it can be edited by performing optimization in the space of the motion manifold. This allows for imposing kinematic constraints, or transforming the style of the motion, while ensuring the edited motion remains natural. As a result, the system can produce smooth, high quality motion sequences without any manual pre-processing of the training data.

...read moreread less

Posted Content•

Learning a Predictable and Generative Vector Representation for Objects

[...]

Rohit Girdhar¹, David F. Fouhey¹, Mikel Rodriguez², Abhinav Gupta¹•Institutions (2)

Carnegie Mellon University¹, Mitre Corporation²

29 Mar 2016-arXiv: Computer Vision and Pattern Recognition

TL;DR: A novel architecture, called the TL-embedding network, is proposed, to learn an embedding space with generative and predictable properties, which enables tackling a number of tasks including voxel prediction from 2D images and 3D model retrieval.

...read moreread less

Abstract: What is a good vector representation of an object? We believe that it should be generative in 3D, in the sense that it can produce new 3D objects; as well as be predictable from 2D, in the sense that it can be perceived from 2D images. We propose a novel architecture, called the TL-embedding network, to learn an embedding space with these properties. The network consists of two components: (a) an autoencoder that ensures the representation is generative; and (b) a convolutional network that ensures the representation is predictable. This enables tackling a number of tasks including voxel prediction from 2D images and 3D model retrieval. Extensive experimental analysis demonstrates the usefulness and versatility of this embedding.

...read moreread less

Proceedings Article•

Multi-task Sequence to Sequence Learning

[...]

Minh-Thang Luong¹, Quoc V. Le¹, Ilya Sutskever¹, Oriol Vinyals¹, Lukasz Kaiser¹ - Show less +1 more•Institutions (1)

Google¹

01 Jan 2016

TL;DR: The results show that training on a small amount of parsing and image caption data can improve the translation quality between English and German by up to 1.5 BLEU points over strong single-task baselines on the WMT benchmarks, and reveal interesting properties of the two unsupervised learning objectives, autoencoder and skip-thought, in the MTL context.

...read moreread less

Abstract: Sequence to sequence learning has recently emerged as a new paradigm in supervised learning. To date, most of its applications focused on only one task and not much work explored this framework for multiple tasks. This paper examines three multi-task learning (MTL) settings for sequence to sequence models: (a) the oneto-many setting - where the encoder is shared between several tasks such as machine translation and syntactic parsing, (b) the many-to-one setting - useful when only the decoder can be shared, as in the case of translation and image caption generation, and (c) the many-to-many setting - where multiple encoders and decoders are shared, which is the case with unsupervised objectives and translation. Our results show that training on a small amount of parsing and image caption data can improve the translation quality between English and German by up to 1.5 BLEU points over strong single-task baselines on the WMT benchmarks. Furthermore, we have established a new state-of-the-art result in constituent parsing with 93.0 F1. Lastly, we reveal interesting properties of the two unsupervised learning objectives, autoencoder and skip-thought, in the MTL context: autoencoder helps less in terms of perplexities but more on BLEU scores compared to skip-thought.

...read moreread less

Posted Content•

Deep Unsupervised Clustering with Gaussian Mixture Variational Autoencoders

[...]

Nat Dilokthanakul, Pedro A. M. Mediano, Marta Garnelo, Matthew C. H. Lee, Hugh Salimbeni, Kai Arulkumaran, Murray Shanahan - Show less +3 more

08 Nov 2016-arXiv: Learning

TL;DR: It is shown that a heuristic called minimum information constraint that has been shown to mitigate this effect in VAEs can also be applied to improve unsupervised clustering performance with this variant of the variational autoencoder model with a Gaussian mixture as a prior distribution.

...read moreread less

Abstract: We study a variant of the variational autoencoder model (VAE) with a Gaussian mixture as a prior distribution, with the goal of performing unsupervised clustering through deep generative models. We observe that the known problem of over-regularisation that has been shown to arise in regular VAEs also manifests itself in our model and leads to cluster degeneracy. We show that a heuristic called minimum information constraint that has been shown to mitigate this effect in VAEs can also be applied to improve unsupervised clustering performance with our model. Furthermore we analyse the effect of this heuristic and provide an intuition of the various processes with the help of visualizations. Finally, we demonstrate the performance of our model on synthetic data, MNIST and SVHN, showing that the obtained clusters are distinct, interpretable and result in achieving competitive performance on unsupervised clustering to the state-of-the-art results.

...read moreread less

Posted Content•

Variational Autoencoder for Deep Learning of Images, Labels and Captions

[...]

Yunchen Pu¹, Zhe Gan¹, Ricardo Henao¹, Xin Yuan², Chunyuan Li¹, Andrew Stevens¹, Lawrence Carin¹ - Show less +3 more•Institutions (2)

Duke University¹, Bell Labs²

28 Sep 2016-arXiv: Machine Learning

TL;DR: A novel variational autoencoder is developed to model images, as well as associated labels or captions, and a new semi-supervised setting is manifested for CNN learning with images; the framework even allows unsupervised CNN learning, based on images alone.

...read moreread less

Abstract: A novel variational autoencoder is developed to model images, as well as associated labels or captions. The Deep Generative Deconvolutional Network (DGDN) is used as a decoder of the latent image features, and a deep Convolutional Neural Network (CNN) is used as an image encoder; the CNN is used to approximate a distribution for the latent DGDN features/code. The latent code is also linked to generative models for labels (Bayesian support vector machine) or captions (recurrent neural network). When predicting a label/caption for a new image at test, averaging is performed across the distribution of latent codes; this is computationally efficient as a consequence of the learned CNN-based encoder. Since the framework is capable of modeling the image in the presence/absence of associated labels/captions, a new semi-supervised setting is manifested for CNN learning with images; the framework even allows unsupervised CNN learning, based on images alone.

...read moreread less

Posted Content•

End-to-end Optimized Image Compression

[...]

Johannes Ballé¹, Valero Laparra², Eero P. Simoncelli•Institutions (2)

New York University¹, University of Valencia²

05 Nov 2016-arXiv: Computer Vision and Pattern Recognition

TL;DR: In this article, a nonlinear analysis transformation, a uniform quantizer, and a non-linear synthesis transformation are used to optimize the entire model for rate-distortion performance over a database of training images.

...read moreread less

Abstract: We describe an image compression method, consisting of a nonlinear analysis transformation, a uniform quantizer, and a nonlinear synthesis transformation. The transforms are constructed in three successive stages of convolutional linear filters and nonlinear activation functions. Unlike most convolutional neural networks, the joint nonlinearity is chosen to implement a form of local gain control, inspired by those used to model biological neurons. Using a variant of stochastic gradient descent, we jointly optimize the entire model for rate-distortion performance over a database of training images, introducing a continuous proxy for the discontinuous loss function arising from the quantizer. Under certain conditions, the relaxed loss function may be interpreted as the log likelihood of a generative model, as implemented by a variational autoencoder. Unlike these models, however, the compression model must operate at any given point along the rate-distortion curve, as specified by a trade-off parameter. Across an independent set of test images, we find that the optimized method generally exhibits better rate-distortion performance than the standard JPEG and JPEG 2000 compression methods. More importantly, we observe a dramatic improvement in visual quality for all images at all bit rates, which is supported by objective quality estimates using MS-SSIM.

...read moreread less

Book Chapter•DOI•

Learning a Predictable and Generative Vector Representation for Objects

[...]

Rohit Girdhar¹, David F. Fouhey¹, Mikel Rodriguez², Abhinav Gupta¹•Institutions (2)

Carnegie Mellon University¹, Mitre Corporation²

08 Oct 2016

TL;DR: The TL-embedding network as discussed by the authors uses an autoencoder to ensure the representation is generative and a convolutional network to ensure it is predictable, which can be used for voxel prediction from 2D images and 3D model retrieval.

...read moreread less

Book Chapter•DOI•

An Uncertain Future: Forecasting from Static Images Using Variational Autoencoders

[...]

Jacob J. Walker¹, Carl Doersch¹, Abhinav Gupta¹, Martial Hebert¹•Institutions (1)

Carnegie Mellon University¹

08 Oct 2016

TL;DR: In this article, a conditional variational autoencoder is proposed to predict the dense trajectory of pixels in a scene, where it will travel, and how it will deform over the course of one second.

...read moreread less

Abstract: In a given scene, humans can easily predict a set of immediate future events that might happen. However, pixel-level anticipation in computer vision is difficult because machine learning struggles with the ambiguity in predicting the future. In this paper, we focus on predicting the dense trajectory of pixels in a scene—what will move in the scene, where it will travel, and how it will deform over the course of one second. We propose a conditional variational autoencoder as a solution to this problem. In this framework, direct inference from the image shapes the distribution of possible trajectories while latent variables encode information that is not available in the image. We show that our method predicts events in a variety of scenes and can produce multiple different predictions for an ambiguous future. We also find that our method learns a representation that is applicable to semantic vision tasks.

...read moreread less

Proceedings Article•DOI•

Deep Learning for solar power forecasting — An approach using AutoEncoder and LSTM Neural Networks

[...]

Andre Gensler¹, Janosch Henze¹, Bernhard Sick¹, Nils Raabe•Institutions (1)

University of Kassel¹

01 Oct 2016

TL;DR: This work uses different Deep Learning and Artificial Neural Network algorithms, such as Deep Belief Networks, AutoEncoder, and LSTM, to show their forecast strength compared to a standard MLP and a physical forecasting model in the forecasting the energy output of 21 solar power plants.

...read moreread less

Abstract: Power forecasting of renewable energy power plants is a very active research field, as reliable information about the future power generation allow for a safe operation of the power grid and helps to minimize the operational costs of these energy sources. Deep Learning algorithms have shown to be very powerful in forecasting tasks, such as economic time series or speech recognition. Up to now, Deep Learning algorithms have only been applied sparsely for forecasting renewable energy power plants. By using different Deep Learning and Artificial Neural Network algorithms, such as Deep Belief Networks, AutoEncoder, and LSTM, we introduce these powerful algorithms in the field of renewable energy power forecasting. In our experiments, we used combinations of these algorithms to show their forecast strength compared to a standard MLP and a physical forecasting model in the forecasting the energy output of 21 solar power plants. Our results using Deep Learning algorithms show a superior forecasting performance compared to Artificial Neural Networks as well as other reference models such as physical models.

...read moreread less

Proceedings Article•DOI•

Deep spatial autoencoders for visuomotor learning

[...]

Chelsea Finn¹, Xin Yu Tan¹, Yan Duan¹, Trevor Darrell¹, Sergey Levine¹, Pieter Abbeel¹ - Show less +2 more•Institutions (1)

University of California, Berkeley¹

16 May 2016

TL;DR: This work presents an approach that automates state-space construction by learning a state representation directly from camera images by using a deep spatial autoencoder to acquire a set of feature points that describe the environment for the current task, such as the positions of objects.

...read moreread less

Abstract: Reinforcement learning provides a powerful and flexible framework for automated acquisition of robotic motion skills. However, applying reinforcement learning requires a sufficiently detailed representation of the state, including the configuration of task-relevant objects. We present an approach that automates state-space construction by learning a state representation directly from camera images. Our method uses a deep spatial autoencoder to acquire a set of feature points that describe the environment for the current task, such as the positions of objects, and then learns a motion skill with these feature points using an efficient reinforcement learning method based on local linear models. The resulting controller reacts continuously to the learned feature points, allowing the robot to dynamically manipulate objects in the world with closed-loop control. We demonstrate our method with a PR2 robot on tasks that include pushing a free-standing toy block, picking up a bag of rice using a spatula, and hanging a loop of rope on a hook at various positions. In each task, our method automatically learns to track task-relevant objects and manipulate their configuration with the robot's arm.

...read moreread less

Posted Content•

Generative and Discriminative Voxel Modeling with Convolutional Neural Networks

[...]

Andrew Brock, Theodore Lim, J. M. Ritchie, Nicholas John Weston

15 Aug 2016-arXiv: Computer Vision and Pattern Recognition

TL;DR: Methods for training voxel-based variational autoencoders, a user interface for exploring the latent space learned by the autoencoder, and a deep convolutional neural network architecture for object classification are presented.

...read moreread less

Abstract: When working with three-dimensional data, choice of representation is key. We explore voxel-based models, and present evidence for the viability of voxellated representations in applications including shape modeling and object classification. Our key contributions are methods for training voxel-based variational autoencoders, a user interface for exploring the latent space learned by the autoencoder, and a deep convolutional neural network architecture for object classification. We address challenges unique to voxel-based representations, and empirically evaluate our models on the ModelNet benchmark, where we demonstrate a 51.5% relative improvement in the state of the art for object classification.

...read moreread less

Proceedings Article•

End-to-end optimized image compression

[...]

Johannes Ballé¹, Valero Laparra², Eero P. Simoncelli•Institutions (2)

New York University¹, University of Valencia²

05 Nov 2016

TL;DR: In this paper, a nonlinear analysis transformation, a uniform quantizer, and a non-linear synthesis transformation are used to optimize the entire model for rate-distortion performance over a database of training images.

...read moreread less

Proceedings Article•

Ladder Variational Autoencoders

[...]

Casper Kaae Sønderby¹, Tapani Raiko², Lars Maaløe, Søren Kaae Sønderby¹, Ole Winther - Show less +1 more•Institutions (2)

University of Copenhagen¹, Aalto University²

06 Feb 2016

TL;DR: This article proposed a new inference model, the Ladder Variational Autoencoder, that recursively corrects the generative distribution by a data dependent approximate likelihood in a process similar to the recently proposed Ladder Network.

...read moreread less

Abstract: Variational autoencoders are powerful models for unsupervised learning. However deep models with several layers of dependent stochastic variables are difficult to train which limits the improvements obtained using these highly expressive models. We propose a new inference model, the Ladder Variational Autoencoder, that recursively corrects the generative distribution by a data dependent approximate likelihood in a process resembling the recently proposed Ladder Network. We show that this model provides state of the art predictive log-likelihood and tighter log-likelihood lower bound compared to the purely bottom-up inference in layered Variational Autoencoders and other generative models. We provide a detailed analysis of the learned hierarchical latent representation and show that our new inference model is qualitatively different and utilizes a deeper more distributed hierarchy of latent variables. Finally, we observe that batch-normalization and deterministic warm-up (gradually turning on the KL-term) are crucial for training variational models with many stochastic layers.

...read moreread less

Posted Content•

Variational Lossy Autoencoder

[...]

Xi Chen¹, Diederik P. Kingma², Tim Salimans², Yan Duan¹, Prafulla Dhariwal², John Schulman¹, Ilya Sutskever³, Pieter Abbeel¹ - Show less +4 more•Institutions (3)

University of California, Berkeley¹, OpenAI², Google³

08 Nov 2016-arXiv: Learning

TL;DR: Li et al. as mentioned in this paper combine VAE with neural autoregressive models such as RNN, MADE and PixelRNN/CNN to learn a global representation for 2D images that describes only global structure and discards information about detailed texture.

...read moreread less

Abstract: Representation learning seeks to expose certain aspects of observed data in a learned representation that's amenable to downstream tasks like classification. For instance, a good representation for 2D images might be one that describes only global structure and discards information about detailed texture. In this paper, we present a simple but principled method to learn such global representations by combining Variational Autoencoder (VAE) with neural autoregressive models such as RNN, MADE and PixelRNN/CNN. Our proposed VAE model allows us to have control over what the global latent code can learn and , by designing the architecture accordingly, we can force the global latent code to discard irrelevant information such as texture in 2D images, and hence the VAE only "autoencodes" data in a lossy fashion. In addition, by leveraging autoregressive models as both prior distribution $p(z)$ and decoding distribution $p(x|z)$, we can greatly improve generative modeling performance of VAEs, achieving new state-of-the-art results on MNIST, OMNIGLOT and Caltech-101 Silhouettes density estimation tasks.

...read moreread less

Collapse