Showing papers on "Autoencoder published in 2015"

PDF

Open Access

Journal Article•DOI•

Traffic Flow Prediction With Big Data: A Deep Learning Approach

[...]

Yisheng Lv, Yanjie Duan, Wenwen Kang, Zhengxi Li¹, Fei-Yue Wang - Show less +1 more•Institutions (1)

01 Apr 2015-IEEE Transactions on Intelligent Transportation Systems

TL;DR: A novel deep-learning-based traffic flow prediction method is proposed, which considers the spatial and temporal correlations inherently and is applied for the first time that a deep architecture model is applied using autoencoders as building blocks to represent traffic flow features for prediction.

...read moreread less

Abstract: Accurate and timely traffic flow information is important for the successful deployment of intelligent transportation systems. Over the last few years, traffic data have been exploding, and we have truly entered the era of big data for transportation. Existing traffic flow prediction methods mainly use shallow traffic prediction models and are still unsatisfying for many real-world applications. This situation inspires us to rethink the traffic flow prediction problem based on deep architecture models with big traffic data. In this paper, a novel deep-learning-based traffic flow prediction method is proposed, which considers the spatial and temporal correlations inherently. A stacked autoencoder model is used to learn generic traffic flow features, and it is trained in a greedy layerwise fashion. To the best of our knowledge, this is the first time that a deep architecture model is applied using autoencoders as building blocks to represent traffic flow features for prediction. Moreover, experiments demonstrate that the proposed method for traffic flow prediction has superior performance.

...read moreread less

2,306 citations

Proceedings Article•DOI•

AutoRec: Autoencoders Meet Collaborative Filtering

[...]

Suvash Sedhain¹, Aditya Krishna Menon¹, Scott Sanner¹, Lexing Xie¹•Institutions (1)

Australian National University¹

18 May 2015

TL;DR: Empirically, AutoRec's compact and efficiently trainable model outperforms state-of-the-art CF techniques (biased matrix factorization, RBM-CF and LLORMA) on the Movielens and Netflix datasets.

...read moreread less

Abstract: This paper proposes AutoRec, a novel autoencoder framework for collaborative filtering (CF). Empirically, AutoRec's compact and efficiently trainable model outperforms state-of-the-art CF techniques (biased matrix factorization, RBM-CF and LLORMA) on the Movielens and Netflix datasets.

...read moreread less

1,015 citations

Posted Content•

A Recurrent Latent Variable Model for Sequential Data

[...]

Junyoung Chung¹, Kyle Kastner¹, Laurent Dinh¹, Kratarth Goel¹, Aaron Courville¹, Yoshua Bengio¹ - Show less +2 more•Institutions (1)

Université de Montréal¹

07 Jun 2015-arXiv: Learning

TL;DR: In this article, the authors explore the use of latent random variables into the dynamic hidden state of a recurrent neural network (RNN) by combining elements of the variational autoencoder.

...read moreread less

Abstract: In this paper, we explore the inclusion of latent random variables into the dynamic hidden state of a recurrent neural network (RNN) by combining elements of the variational autoencoder. We argue that through the use of high-level latent random variables, the variational RNN (VRNN)1 can model the kind of variability observed in highly structured sequential data such as natural speech. We empirically evaluate the proposed model against related sequential models on four speech datasets and one handwriting dataset. Our results show the important roles that latent random variables can play in the RNN dynamic hidden state.

...read moreread less

812 citations

Posted Content•

Importance Weighted Autoencoders

[...]

Yuri Burda¹, Roger Grosse¹, Ruslan Salakhutdinov¹•Institutions (1)

University of Toronto¹

01 Sep 2015-arXiv: Learning

TL;DR: The importance weighted autoencoder (IWAE), a generative model with the same architecture as the VAE, but which uses a strictly tighter log-likelihood lower bound derived from importance weighting, shows empirically that IWAEs learn richer latent space representations than VAEs, leading to improved test log- likelihood on density estimation benchmarks.

...read moreread less

Abstract: The variational autoencoder (VAE; Kingma, Welling (2014)) is a recently proposed generative model pairing a top-down generative network with a bottom-up recognition network which approximates posterior inference. It typically makes strong assumptions about posterior inference, for instance that the posterior distribution is approximately factorial, and that its parameters can be approximated with nonlinear regression from the observations. As we show empirically, the VAE objective can lead to overly simplified representations which fail to use the network's entire modeling capacity. We present the importance weighted autoencoder (IWAE), a generative model with the same architecture as the VAE, but which uses a strictly tighter log-likelihood lower bound derived from importance weighting. In the IWAE, the recognition network uses multiple samples to approximate the posterior, giving it increased flexibility to model complex posteriors which do not fit the VAE modeling assumptions. We show empirically that IWAEs learn richer latent space representations than VAEs, leading to improved test log-likelihood on density estimation benchmarks.

...read moreread less

793 citations

Posted Content•

Semi-supervised Sequence Learning

[...]

Andrew M. Dai¹, Quoc V. Le¹•Institutions (1)

Google¹

04 Nov 2015-arXiv: Learning

TL;DR: This paper used unlabeled data to improve sequence learning with recurrent networks, which can be used as a "pre-training" step for a later supervised sequence learning algorithm, so that the parameters obtained from the unsupervised step can be a starting point for other supervised training models.

...read moreread less

Abstract: We present two approaches that use unlabeled data to improve sequence learning with recurrent networks. The first approach is to predict what comes next in a sequence, which is a conventional language model in natural language processing. The second approach is to use a sequence autoencoder, which reads the input sequence into a vector and predicts the input sequence again. These two algorithms can be used as a "pretraining" step for a later supervised sequence learning algorithm. In other words, the parameters obtained from the unsupervised step can be used as a starting point for other supervised training models. In our experiments, we find that long short term memory recurrent networks after being pretrained with the two approaches are more stable and generalize better. With pretraining, we are able to train long short term memory recurrent networks up to a few hundred timesteps, thereby achieving strong performance in many text classification tasks, such as IMDB, DBpedia and 20 Newsgroups.

...read moreread less

711 citations

Proceedings Article•

Semi-supervised Sequence Learning

[...]

Andrew M. Dai¹, Quoc V. Le¹•Institutions (1)

Google¹

07 Dec 2015

TL;DR: Two approaches to use unlabeled data to improve Sequence Learning with recurrent networks are presented and it is found that long short term memory recurrent networks after pretrained with the two approaches become more stable to train and generalize better.

...read moreread less

Abstract: We present two approaches to use unlabeled data to improve Sequence Learning with recurrent networks. The first approach is to predict what comes next in a sequence, which is a language model in NLP. The second approach is to use a sequence autoencoder, which reads the input sequence into a vector and predicts the input sequence again. These two algorithms can be used as a "pretraining" algorithm for a later supervised sequence learning algorithm. In other words, the parameters obtained from the pretraining step can then be used as a starting point for other supervised training models. In our experiments, we find that long short term memory recurrent networks after pretrained with the two approaches become more stable to train and generalize better. With pretraining, we were able to achieve strong performance in many classification tasks, such as text classification with IMDB, DBpedia or image recognition in CIFAR-10.

...read moreread less

688 citations

Posted Content•

LLNet: A Deep Autoencoder Approach to Natural Low-light Image Enhancement

[...]

Kin Gwn Lore¹, Adedotun Akintayo¹, Soumik Sarkar¹•Institutions (1)

Iowa State University¹

12 Nov 2015-arXiv: Computer Vision and Pattern Recognition

TL;DR: It is shown that a variant of the stacked-sparse denoising autoencoder can learn from synthetically darkened and noise-added training examples to adaptively enhance images taken from natural low-light environment and/or are hardware-degraded.

...read moreread less

Abstract: In surveillance, monitoring and tactical reconnaissance, gathering the right visual information from a dynamic environment and accurately processing such data are essential ingredients to making informed decisions which determines the success of an operation. Camera sensors are often cost-limited in ability to clearly capture objects without defects from images or videos taken in a poorly-lit environment. The goal in many applications is to enhance the brightness, contrast and reduce noise content of such images in an on-board real-time manner. We propose a deep autoencoder-based approach to identify signal features from low-light images handcrafting and adaptively brighten images without over-amplifying the lighter parts in images (i.e., without saturation of image pixels) in high dynamic range. We show that a variant of the recently proposed stacked-sparse denoising autoencoder can learn to adaptively enhance and denoise from synthetically darkened and noisy training examples. The network can then be successfully applied to naturally low-light environment and/or hardware degraded images. Results show significant credibility of deep learning based approaches both visually and by quantitative comparison with various popular enhancing, state-of-the-art denoising and hybrid enhancing-denoising techniques.

...read moreread less

662 citations

Variational autoencoder based anomaly detection using reconstruction probability

[...]

An J

01 Jan 2015

572 citations

Proceedings Article•

A recurrent latent variable model for sequential data

[...]

Junyoung Chung¹, Kyle Kastner¹, Laurent Dinh¹, Kratarth Goel¹, Aaron Courville¹, Yoshua Bengio¹ - Show less +2 more•Institutions (1)

Université de Montréal¹

07 Dec 2015

TL;DR: It is argued that through the use of high-level latent random variables, the variational RNN (VRNN)1 can model the kind of variability observed in highly structured sequential data such as natural speech.

...read moreread less

Abstract: In this paper, we explore the inclusion of latent random variables into the hidden state of a recurrent neural network (RNN) by combining the elements of the variational autoencoder. We argue that through the use of high-level latent random variables, the variational RNN (VRNN)1 can model the kind of variability observed in highly structured sequential data such as natural speech. We empirically evaluate the proposed model against other related sequential models on four speech datasets and one handwriting dataset. Our results show the important roles that latent random variables can play in the RNN dynamics.

...read moreread less

539 citations

Posted Content•

A Hierarchical Neural Autoencoder for Paragraphs and Documents

[...]

Jiwei Li, Minh-Thang Luong, Dan Jurafsky

02 Jun 2015-arXiv: Computation and Language

TL;DR: This paper proposed a hierarchical LSTM auto-encoder to preserve and reconstruct multi-sentence paragraphs, and evaluated the reconstructed paragraph using standard metrics like ROUGE and Entity Grid, showing that neural models can encode texts in a way that preserve syntactic, semantic, and discourse coherence.

...read moreread less

Abstract: Natural language generation of coherent long texts like paragraphs or longer documents is a challenging problem for recurrent networks models. In this paper, we explore an important step toward this generation task: training an LSTM (Long-short term memory) auto-encoder to preserve and reconstruct multi-sentence paragraphs. We introduce an LSTM model that hierarchically builds an embedding for a paragraph from embeddings for sentences and words, then decodes this embedding to reconstruct the original paragraph. We evaluate the reconstructed paragraph using standard metrics like ROUGE and Entity Grid, showing that neural models are able to encode texts in a way that preserve syntactic, semantic, and discourse coherence. While only a first step toward generating coherent text units from neural models, our work has the potential to significantly impact natural language generation and summarization\footnote{Code for the three models described in this paper can be found at this http URL .

...read moreread less

515 citations

Journal Article•DOI•

Multimodal Deep Autoencoder for Human Pose Recovery

[...]

Chaoqun Hong¹, Jun Yu², Jian Wan², Dacheng Tao³, Meng Wang⁴ - Show less +1 more•Institutions (4)

Xiamen University of Technology¹, Hangzhou Dianzi University², University of Technology, Sydney³, Hefei University of Technology⁴

07 Oct 2015-IEEE Transactions on Image Processing

TL;DR: A novel pose recovery method using non-linear mapping with multi-layered deep neural network and back-propagation deep learning to obtain a unified feature description by standard eigen-decomposition of the hypergraph Laplacian matrix.

...read moreread less

Abstract: Video-based human pose recovery is usually conducted by retrieving relevant poses using image features. In the retrieving process, the mapping between 2D images and 3D poses is assumed to be linear in most of the traditional methods. However, their relationships are inherently non-linear, which limits recovery performance of these methods. In this paper, we propose a novel pose recovery method using non-linear mapping with multi-layered deep neural network. It is based on feature extraction with multimodal fusion and back-propagation deep learning. In multimodal fusion, we construct hypergraph Laplacian with low-rank representation. In this way, we obtain a unified feature description by standard eigen-decomposition of the hypergraph Laplacian matrix. In back-propagation deep learning, we learn a non-linear mapping from 2D images to 3D poses with parameter fine-tuning. The experimental results on three data sets show that the recovery error has been reduced by 20%–25%, which demonstrates the effectiveness of the proposed method.

...read moreread less

Posted Content•

MADE: Masked Autoencoder for Distribution Estimation

[...]

Mathieu Germain¹, Karol Gregor², Iain Murray³, Hugo Larochelle¹•Institutions (3)

Université de Sherbrooke¹, Google², University of Edinburgh³

12 Feb 2015-arXiv: Learning

TL;DR: In this article, the autoencoder outputs are interpreted as a set of conditional probabilities, and their product, the full joint probability, and a single network can decompose the joint probability in multiple different orderings.

...read moreread less

Abstract: There has been a lot of recent interest in designing neural network models to estimate a distribution from a set of examples. We introduce a simple modification for autoencoder neural networks that yields powerful generative models. Our method masks the autoencoder's parameters to respect autoregressive constraints: each input is reconstructed only from previous inputs in a given ordering. Constrained this way, the autoencoder outputs can be interpreted as a set of conditional probabilities, and their product, the full joint probability. We can also train a single network that can decompose the joint probability in multiple different orderings. Our simple framework can be applied to multiple architectures, including deep ones. Vectorized implementations, such as on GPUs, are simple and fast. Experiments demonstrate that this approach is competitive with state-of-the-art tractable distribution estimators. At test time, the method is significantly faster and scales better than other autoregressive estimators.

...read moreread less

Proceedings Article•DOI•

Domain Generalization for Object Recognition with Multi-task Autoencoders

[...]

Muhammad Ghifary¹, W. Bastiaan Kleijn¹, Mengjie Zhang¹, David Balduzzi¹•Institutions (1)

Victoria University of Wellington¹

07 Dec 2015

TL;DR: In this article, a multi-task autoencoder (MTAE) is proposed to transform the original image into analogs in multiple related domains, which are then used as inputs to a classifier.

...read moreread less

Abstract: The problem of domain generalization is to take knowledge acquired from a number of related domains, where training data is available, and to then successfully apply it to previously unseen domains. We propose a new feature learning algorithm, Multi-Task Autoencoder (MTAE), that provides good generalization performance for cross-domain object recognition. The algorithm extends the standard denoising autoencoder framework by substituting artificially induced corruption with naturally occurring inter-domain variability in the appearance of objects. Instead of reconstructing images from noisy versions, MTAE learns to transform the original image into analogs in multiple related domains. It thereby learns features that are robust to variations across domains. The learnt features are then used as inputs to a classifier. We evaluated the performance of the algorithm on benchmark image recognition datasets, where the task is to learn features from multiple datasets and to then predict the image label from unseen datasets. We found that (denoising) MTAE outperforms alternative autoencoder-based models as well as the current state-of-the-art algorithms for domain generalization.

...read moreread less

Posted Content•

Variable Rate Image Compression with Recurrent Neural Networks

[...]

George Toderici¹, Sean M. O'Malley¹, Sung Jin Hwang¹, Damien Vincent¹, David Minnen¹, Shumeet Baluja¹, Michele Covell¹, Rahul Sukthankar¹ - Show less +4 more•Institutions (1)

Google¹

19 Nov 2015-arXiv: Computer Vision and Pattern Recognition

TL;DR: A general framework for variable-rate image compression and a novel architecture based on convolutional and deconvolutional LSTM recurrent networks are proposed, which provide better visual quality than (headerless) JPEG, JPEG2000 and WebP, with a storage size reduced by 10% or more.

...read moreread less

Abstract: A large fraction of Internet traffic is now driven by requests from mobile devices with relatively small screens and often stringent bandwidth requirements. Due to these factors, it has become the norm for modern graphics-heavy websites to transmit low-resolution, low-bytecount image previews (thumbnails) as part of the initial page load process to improve apparent page responsiveness. Increasing thumbnail compression beyond the capabilities of existing codecs is therefore a current research focus, as any byte savings will significantly enhance the experience of mobile device users. Toward this end, we propose a general framework for variable-rate image compression and a novel architecture based on convolutional and deconvolutional LSTM recurrent networks. Our models address the main issues that have prevented autoencoder neural networks from competing with existing image compression algorithms: (1) our networks only need to be trained once (not per-image), regardless of input image dimensions and the desired compression rate; (2) our networks are progressive, meaning that the more bits are sent, the more accurate the image reconstruction; and (3) the proposed architecture is at least as efficient as a standard purpose-trained autoencoder for a given number of bits. On a large-scale benchmark of 32$\times$32 thumbnails, our LSTM-based approaches provide better visual quality than (headerless) JPEG, JPEG2000 and WebP, with a storage size that is reduced by 10% or more.

...read moreread less

Proceedings Article•DOI•

A Hierarchical Neural Autoencoder for Paragraphs and Documents

[...]

Jiwei Li¹, Thang Luong¹, Dan Jurafsky¹•Institutions (1)

Stanford University¹

02 Jun 2015

TL;DR: This paper introduces an LSTM model that hierarchically builds an embedding for a paragraph from embeddings for sentences and words, then decodes this embedding to reconstruct the original paragraph and evaluates the reconstructed paragraph using standard metrics to show that neural models are able to encode texts in a way that preserve syntactic, semantic, and discourse coherence.

...read moreread less

Abstract: Natural language generation of coherent long texts like paragraphs or longer documents is a challenging problem for recurrent networks models. In this paper, we explore an important step toward this generation task: training an LSTM (Longshort term memory) auto-encoder to preserve and reconstruct multi-sentence paragraphs. We introduce an LSTM model that hierarchically builds an embedding for a paragraph from embeddings for sentences and words, then decodes this embedding to reconstruct the original paragraph. We evaluate the reconstructed paragraph using standard metrics like ROUGE and Entity Grid, showing that neural models are able to encode texts in a way that preserve syntactic, semantic, and discourse coherence. While only a first step toward generating coherent text units from neural models, our work has the potential to significantly impact natural language generation and summarization1.

...read moreread less

Posted Content•

Autoencoding beyond pixels using a learned similarity metric

[...]

Anders Boesen Lindbo Larsen¹, Søren Kaae Sønderby², Hugo Larochelle³, Ole Winther²•Institutions (3)

Technical University of Denmark¹, University of Copenhagen², Twitter³

31 Dec 2015-arXiv: Learning

TL;DR: An autoencoder that leverages learned representations to better measure similarities in data space is presented and it is shown that the method learns an embedding in which high-level abstract visual features (e.g. wearing glasses) can be modified using simple arithmetic.

...read moreread less

Abstract: We present an autoencoder that leverages learned representations to better measure similarities in data space. By combining a variational autoencoder with a generative adversarial network we can use learned feature representations in the GAN discriminator as basis for the VAE reconstruction objective. Thereby, we replace element-wise errors with feature-wise errors to better capture the data distribution while offering invariance towards e.g. translation. We apply our method to images of faces and show that it outperforms VAEs with element-wise similarity measures in terms of visual fidelity. Moreover, we show that the method learns an embedding in which high-level abstract visual features (e.g. wearing glasses) can be modified using simple arithmetic.

...read moreread less

Journal Article•DOI•

Unsupervised Spectral–Spatial Feature Learning With Stacked Sparse Autoencoder for Hyperspectral Imagery Classification

[...]

Chao Tao¹, Hongbo Pan¹, Yansheng Li², Zhengrou Zou¹•Institutions (2)

Central South University¹, Wuhan University²

12 Oct 2015-IEEE Geoscience and Remote Sensing Letters

TL;DR: This letter proposes to adaptively learn a suitable feature representation from unlabeled data by learning a feature mapping function based on stacked sparse autoencoder that embeds the learned spectral-spatial feature into a linear support vector machine for classification.

...read moreread less

Abstract: In this letter, different from traditional methods using original spectral features or handcraft spectral–spatial features, we propose to adaptively learn a suitable feature representation from unlabeled data. This is achieved by learning a feature mapping function based on stacked sparse autoencoder. Considering that hyperspectral imagery (HSI) is intrinsically defined in both the spectral and spatial domains, we further establish two variants of feature learning procedures for sparse spectral feature learning and multiscale spatial feature learning. Finally, we embed the learned spectral–spatial feature into a linear support vector machine for classification. Experiments on two hyperspectral images indicate the following: 1) the learned spectral–spatial feature representation is more discriminative for HSI classification compared to previously hand-engineered spectral–spatial features, especially when the training data are limited and 2) the learned features appear not to be specific to a particular image but general in that they are applicable to multiple related images (e.g., images acquired by the same sensor but varying with location or time).

...read moreread less

Posted Content•

Adversarial Autoencoders

[...]

Alireza Makhzani, Jonathon Shlens, Navdeep Jaitly, Ian Goodfellow, Brendan J. Frey - Show less +1 more

18 Nov 2015-arXiv: Learning

TL;DR: The adversarial autoencoder (AAE) as discussed by the authors uses the generative adversarial networks (GAN) to perform variational inference by matching the aggregated posterior of the hidden code vector of the autoencoders with an arbitrary prior distribution, which ensures that generating from any part of prior space results in meaningful samples.

...read moreread less

Abstract: In this paper, we propose the "adversarial autoencoder" (AAE), which is a probabilistic autoencoder that uses the recently proposed generative adversarial networks (GAN) to perform variational inference by matching the aggregated posterior of the hidden code vector of the autoencoder with an arbitrary prior distribution. Matching the aggregated posterior to the prior ensures that generating from any part of prior space results in meaningful samples. As a result, the decoder of the adversarial autoencoder learns a deep generative model that maps the imposed prior to the data distribution. We show how the adversarial autoencoder can be used in applications such as semi-supervised classification, disentangling style and content of images, unsupervised clustering, dimensionality reduction and data visualization. We performed experiments on MNIST, Street View House Numbers and Toronto Face datasets and show that adversarial autoencoders achieve competitive results in generative modeling and semi-supervised classification tasks.

...read moreread less

Proceedings Article•DOI•

Predicting Short-Term Traffic Flow by Long Short-Term Memory Recurrent Neural Network

[...]

Yongxue Tian¹, Li Pan¹•Institutions (1)

Shanghai Jiao Tong University¹

01 Dec 2015

TL;DR: A model called Long Short-Term Memory Recurrent Neural Network (LSTM RNN) is proposed in this paper, which takes advantages of the three multiplicative units in the memory block to determine the optimal time lags dynamically.

...read moreread less

Abstract: Intelligent Transportation System (ITS) is a significant part of smart city, and short-term traffic flow prediction plays an important role in intelligent transportation management and route guidance. A number of models and algorithms based on time series prediction and machine learning were applied to short-term traffic flow prediction and achieved good results. However, most of the models require the length of the input historical data to be predefined and static, which cannot automatically determine the optimal time lags. To overcome this shortage, a model called Long Short-Term Memory Recurrent Neural Network (LSTM RNN) is proposed in this paper, which takes advantages of the three multiplicative units in the memory block to determine the optimal time lags dynamically. The dataset from Caltrans Performance Measurement System (PeMS) is used for building the model and comparing LSTM RNN with several well-known models, such as random walk(RW), support vector machine(SVM), single layer feed forward neural network(FFNN) and stacked autoencoder(SAE). The results show that the proposed prediction model achieves higher accuracy and generalizes well.

...read moreread less

Posted Content•

Spatio-temporal video autoencoder with differentiable memory

[...]

Viorica Patraucean, Ankur Handa, Roberto Cipolla

19 Nov 2015-arXiv: Learning

TL;DR: One direct application of the proposed framework in weakly-supervised semantic segmentation of videos through label propagation using optical flow is presented, using as temporal decoder a robust optical flow prediction module together with an image sampler serving as built-in feedback loop.

...read moreread less

Abstract: We describe a new spatio-temporal video autoencoder, based on a classic spatial image autoencoder and a novel nested temporal autoencoder. The temporal encoder is represented by a differentiable visual memory composed of convolutional long short-term memory (LSTM) cells that integrate changes over time. Here we target motion changes and use as temporal decoder a robust optical flow prediction module together with an image sampler serving as built-in feedback loop. The architecture is end-to-end differentiable. At each time step, the system receives as input a video frame, predicts the optical flow based on the current observation and the LSTM memory state as a dense transformation map, and applies it to the current frame to generate the next frame. By minimising the reconstruction error between the predicted next frame and the corresponding ground truth next frame, we train the whole system to extract features useful for motion estimation without any supervision effort. We present one direct application of the proposed framework in weakly-supervised semantic segmentation of videos through label propagation using optical flow.

...read moreread less

Proceedings Article•DOI•

Learning Discriminative Reconstructions for Unsupervised Outlier Removal

[...]

Yan Xia¹, Xudong Cao², Fang Wen², Gang Hua², Jian Sun² - Show less +1 more•Institutions (2)

University of Science and Technology of China¹, Microsoft²

07 Dec 2015

TL;DR: This work gradually inject discriminative information in the learning process of an autoencoder to make the inliers and the outliers more separable when data are reconstructed from low-dimensional representations.

...read moreread less

Abstract: We study the problem of automatically removing outliers from noisy data, with application for removing outlier images from an image collection. We address this problem by utilizing the reconstruction errors of an autoencoder. We observe that when data are reconstructed from low-dimensional representations, the inliers and the outliers can be well separated according to their reconstruction errors. Based on this basic observation, we gradually inject discriminative information in the learning process of an autoencoder to make the inliers and the outliers more separable. Experiments on a variety of image datasets validate our approach.

...read moreread less

Journal Article•DOI•

Effective and Efficient Midlevel Visual Elements-Oriented Land-Use Classification Using VHR Remote Sensing Images

[...]

Gong Cheng¹, Junwei Han¹, Lei Guo¹, Zhenbao Liu¹, Shuhui Bu¹, Jinchang Ren - Show less +2 more•Institutions (1)

Northwestern Polytechnical University¹

20 Feb 2015-IEEE Transactions on Geoscience and Remote Sensing

TL;DR: Comprehensive evaluations on a publicly available 21-class VHR landuse data set and comparisons with state-of-the-art approaches demonstrate the effectiveness and superiority of the partlets-based land-use classification method.

...read moreread less

Abstract: Land-use classification using remote sensing images covers a wide range of applications. With more detailed spatial and textural information provided in very high resolution (VHR) remote sensing images, a greater range of objects and spatial patterns can be observed than ever before. This offers us a new opportunity for advancing the performance of land-use classification. In this paper, we first introduce an effective midlevel visual elements-oriented land-use classification method based on “partlets,” which are a library of pretrained part detectors used for midlevel visual elements discovery. Taking advantage of midlevel visual elements rather than low-level image features, a partlets-based method represents images by computing their responses to a large number of part detectors. As the number of part detectors grows, a main obstacle to the broader application of this method is its computational cost. To address this problem, we next propose a novel framework to train coarse-to-fine shared intermediate representations, which are termed “sparselets,” from a large number of pretrained part detectors. This is achieved by building a single-hidden-layer autoencoder and a single-hidden-layer neural network with an $L0$ -norm sparsity constraint, respectively. Comprehensive evaluations on a publicly available 21-class VHR land-use data set and comparisons with state-of-the-art approaches demonstrate the effectiveness and superiority of this paper.

...read moreread less

Proceedings Article•

Supervised representation learning: transfer learning with deep autoencoders

[...]

Fuzhen Zhuang¹, Xiaohu Cheng¹, Ping Luo¹, Sinno Jialin Pan², Qing He¹ - Show less +1 more•Institutions (2)

Chinese Academy of Sciences¹, Nanyang Technological University²

25 Jul 2015

TL;DR: This paper proposes a supervised representation learning method based on deep autoencoders for transfer learning that consists of an embedding layer and a label encoding layer that minimize the difference between domains explicitly and encode label information in learning the representation.

...read moreread less

Abstract: Transfer learning has attracted a lot of attention in the past decade. One crucial research issue in transfer learning is how to find a good representation for instances of different domains such that the divergence between domains can be reduced with the new representation. Recently, deep learning has been proposed to learn more robust or higherlevel features for transfer learning. However, to the best of our knowledge, most of the previous approaches neither minimize the difference between domains explicitly nor encode label information in learning the representation. In this paper, we propose a supervised representation learning method based on deep autoencoders for transfer learning. The proposed deep autoencoder consists of two encoding layers: an embedding layer and a label encoding layer. In the embedding layer, the distance in distributions of the embedded instances between the source and target domains is minimized in terms of KL-Divergence. In the label encoding layer, label information of the source domain is encoded using a softmax regression model. Extensive experiments conducted on three real-world image datasets demonstrate the effectiveness of our proposed method compared with several state-of-the-art baseline methods.

...read moreread less

Proceedings Article•

MADE: Masked Autoencoder for Distribution Estimation

[...]

Mathieu Germain¹, Karol Gregor², Iain Murray³, Hugo Larochelle¹•Institutions (3)

Université de Sherbrooke¹, Google², University of Edinburgh³

06 Jul 2015

TL;DR: This work introduces a simple modification for autoencoder neural networks that yields powerful generative models and proves that this approach is competitive with state-of-the-art tractable distribution estimators.

...read moreread less

Posted Content•

Generating Sentences from a Continuous Space

[...]

Samuel R. Bowman¹, Luke Vilnis², Oriol Vinyals, Andrew M. Dai, Rafal Jozefowicz, Samy Bengio - Show less +2 more•Institutions (2)

Stanford University¹, University of Massachusetts Amherst²

19 Nov 2015-arXiv: Learning

TL;DR: This article proposed an RNN-based variational autoencoder generative model that incorporates distributed latent representations of entire sentences to explicitly model holistic properties of sentences such as style, topic, and high-level syntactic features.

...read moreread less

Abstract: The standard recurrent neural network language model (RNNLM) generates sentences one word at a time and does not work from an explicit global sentence representation. In this work, we introduce and study an RNN-based variational autoencoder generative model that incorporates distributed latent representations of entire sentences. This factorization allows it to explicitly model holistic properties of sentences such as style, topic, and high-level syntactic features. Samples from the prior over these sentence representations remarkably produce diverse and well-formed sentences through simple deterministic decoding. By examining paths through this latent space, we are able to generate coherent novel sentences that interpolate between known sentences. We present techniques for solving the difficult learning problem presented by this model, demonstrate its effectiveness in imputing missing words, explore many interesting properties of the model's latent sentence space, and present negative results on the use of the model in language modeling.

...read moreread less

Journal Article•DOI•

High-Resolution SAR Image Classification via Deep Convolutional Autoencoders

[...]

Jie Geng¹, Jianchao Fan, Hongyu Wang¹, Xiaorui Ma¹, Baoming Li, Fuliang Chen - Show less +2 more•Institutions (1)

Dalian University of Technology¹

01 Oct 2015-IEEE Geoscience and Remote Sensing Letters

TL;DR: The experiments of TerraSAR-X image demonstrate that the DCAE network can extract efficient features and perform better classification result compared with some related algorithms.

...read moreread less

Abstract: Synthetic aperture radar (SAR) image classification is a hot topic in the interpretation of SAR images. However, the absence of effective feature representation and the presence of speckle noise in SAR images make classification difficult to handle. In order to overcome these problems, a deep convolutional autoencoder (DCAE) is proposed to extract features and conduct classification automatically. The deep network is composed of eight layers: a convolutional layer to extract texture features, a scale transformation layer to aggregate neighbor information, four layers based on sparse autoencoders to optimize features and classify, and last two layers for postprocessing. Compared with hand-crafted features, the DCAE network provides an automatic method to learn discriminative features from the image. A series of filters is designed as convolutional units to comprise the gray-level cooccurrence matrix and Gabor features together. Scale transformation is conducted to reduce the influence of the noise, which integrates the correlated neighbor pixels. Sparse autoencoders seek better representation of features to match the classifier, since training labels are added to fine-tune the parameters of the networks. Morphological smoothing removes the isolated points of the classification map. The whole network is designed ingeniously, and each part has a contribution to the classification accuracy. The experiments of TerraSAR-X image demonstrate that the DCAE network can extract efficient features and perform better classification result compared with some related algorithms.

...read moreread less

Posted Content•

The Variational Fair Autoencoder

[...]

Christos Louizos¹, Kevin Swersky², Yujia Li², Max Welling¹, Max Welling³, Max Welling⁴, Richard S. Zemel², Richard S. Zemel⁴ - Show less +4 more•Institutions (4)

University of Amsterdam¹, University of Toronto², University of California, Irvine³, Canadian Institute for Advanced Research⁴

03 Nov 2015-arXiv: Machine Learning

TL;DR: This model is based on a variational autoencoding architecture with priors that encourage independence between sensitive and latent factors of variation that is more effective than previous work in removing unwanted sources of variation while maintaining informative latent representations.

...read moreread less

Abstract: We investigate the problem of learning representations that are invariant to certain nuisance or sensitive factors of variation in the data while retaining as much of the remaining information as possible. Our model is based on a variational autoencoding architecture with priors that encourage independence between sensitive and latent factors of variation. Any subsequent processing, such as classification, can then be performed on this purged latent representation. To remove any remaining dependencies we incorporate an additional penalty term based on the "Maximum Mean Discrepancy" (MMD) measure. We discuss how these architectures can be efficiently trained on data and show in experiments that this method is more effective than previous work in removing unwanted sources of variation while maintaining informative latent representations.

...read moreread less

Proceedings Article•

Winner-take-all autoencoders

[...]

Alireza Makhzani¹, Brendan J. Frey¹•Institutions (1)

University of Toronto¹

07 Dec 2015

TL;DR: It is shown that winner-take-all autoencoders can be used to to learn deep sparse representations from the MNIST, CIFAR-10, ImageNet, Street View House Numbers and Toronto Face datasets, and achieve competitive classification performance.

...read moreread less

Abstract: In this paper, we propose a winner-take-all method for learning hierarchical sparse representations in an unsupervised fashion. We first introduce fully-connected winner-take-all autoencoders which use mini-batch statistics to directly enforce a lifetime sparsity in the activations of the hidden units. We then propose the convolutional winner-take-all autoencoder which combines the benefits of convolutional architectures and autoencoders for learning shift-invariant sparse representations. We describe a way to train convolutional autoencoders layer by layer, where in addition to lifetime sparsity, a spatial sparsity within each feature map is achieved using winner-take-all activation functions. We will show that winner-take-all autoencoders can be used to to learn deep sparse representations from the MNIST, CIFAR-10, ImageNet, Street View House Numbers and Toronto Face datasets, and achieve competitive classification performance.

...read moreread less

Proceedings Article•DOI•

A novel approach for automatic acoustic novelty detection using a denoising autoencoder with bidirectional LSTM neural networks

[...]

Erik Marchi¹, Fabio Vesperini², Florian Eyben¹, Stefano Squartini², Björn Schuller³ - Show less +1 more•Institutions (3)

Technische Universität München¹, Marche Polytechnic University², University of Passau³

19 Apr 2015

TL;DR: This paper presents a novel unsupervised approach based on a denoising autoencoder which significantly outperforms existing methods by achieving up to 93.4% F-Measure.

...read moreread less

Abstract: Acoustic novelty detection aims at identifying abnormal/novel acoustic signals which differ from the reference/normal data that the system was trained with. In this paper we present a novel unsupervised approach based on a denoising autoencoder. In our approach auditory spectral features are processed by a denoising autoencoder with bidirectional Long Short-Term Memory recurrent neural networks. We use the reconstruction error between the input and the output of the autoencoder as activation signal to detect novel events. The autoencoder is trained on a public database which contains recordings of typical in-home situations such as talking, watching television, playing and eating. The evaluation was performed on more than 260 different abnormal events. We compare results with state-of-theart methods and we conclude that our novel approach significantly outperforms existing methods by achieving up to 93.4% F-Measure.

...read moreread less

Journal Article•DOI•

Single Sample Face Recognition via Learning Deep Supervised Autoencoders

[...]

Shenghua Gao¹, Yingliang Zhang¹, Kui Jia², Jiwen Lu³•Institutions (3)

ShanghaiTech University¹, University of Macau², Agency for Science, Technology and Research³

16 Jun 2015-IEEE Transactions on Information Forensics and Security

TL;DR: Results demonstrate that the stacked supervised autoencoders-based face representation significantly outperforms the commonly used image representations in single sample per person face recognition, and it achieves higher recognition accuracy compared with other deep learning models, including the deep Lambertian network.

...read moreread less

Abstract: This paper targets learning robust image representation for single training sample per person face recognition. Motivated by the success of deep learning in image representation, we propose a supervised autoencoder, which is a new type of building block for deep architectures. There are two features distinct our supervised autoencoder from standard autoencoder. First, we enforce the faces with variants to be mapped with the canonical face of the person, for example, frontal face with neutral expression and normal illumination; Second, we enforce features corresponding to the same person to be similar. As a result, our supervised autoencoder extracts the features which are robust to variances in illumination, expression, occlusion, and pose, and facilitates the face recognition. We stack such supervised autoencoders to get the deep architecture and use it for extracting features in image representation. Experimental results on the AR, Extended Yale B, CMU-PIE, and Multi-PIE data sets demonstrate that by coupling with the commonly used sparse representation-based classification, our stacked supervised autoencoders-based face representation significantly outperforms the commonly used image representations in single sample per person face recognition, and it achieves higher recognition accuracy compared with other deep learning models, including the deep Lambertian network, in spite of much less training data and without any domain information. Moreover, supervised autoencoder can also be used for face verification, which further demonstrates its effectiveness for face representation.

...read moreread less

Collapse