Showing papers on "Autoencoder published in 2017"

PDF

Open Access

Proceedings Article•

beta-VAE: Learning Basic Visual Concepts with a Constrained Variational Framework

[...]

Irina Higgins¹, Loic Matthey¹, Arka Pal¹, Christopher P. Burgess¹, Xavier Glorot¹, Matthew Botvinick¹, Shakir Mohamed¹, Alexander Lerchner¹ - Show less +4 more•Institutions (1)

Google¹

24 Apr 2017

TL;DR: In this article, a modification of the variational autoencoder (VAE) framework is proposed to learn interpretable factorised latent representations from raw image data in a completely unsupervised manner.

...read moreread less

Abstract: Learning an interpretable factorised representation of the independent data generative factors of the world without supervision is an important precursor for the development of artificial intelligence that is able to learn and reason in the same way that humans do. We introduce beta-VAE, a new state-of-the-art framework for automated discovery of interpretable factorised latent representations from raw image data in a completely unsupervised manner. Our approach is a modification of the variational autoencoder (VAE) framework. We introduce an adjustable hyperparameter beta that balances latent channel capacity and independence constraints with reconstruction accuracy. We demonstrate that beta-VAE with appropriately tuned beta > 1 qualitatively outperforms VAE (beta = 1), as well as state of the art unsupervised (InfoGAN) and semi-supervised (DC-IGN) approaches to disentangled factor learning on a variety of datasets (celebA, faces and chairs). Furthermore, we devise a protocol to quantitatively compare the degree of disentanglement learnt by different models, and show that our approach also significantly outperforms all baselines quantitatively. Unlike InfoGAN, beta-VAE is stable to train, makes few assumptions about the data and relies on tuning a single hyperparameter, which can be directly optimised through a hyper parameter search using weakly labelled data or through heuristic visual inspection for purely unsupervised data.

...read moreread less

3,670 citations

Proceedings Article•

Neural Discrete Representation Learning

[...]

Aaron van den Oord¹, Oriol Vinyals¹, Koray Kavukcuoglu¹•Institutions (1)

Google¹

02 Nov 2017

TL;DR: The Vector Quantised-Variational AutoEncoder (VQ-VAE) as discussed by the authors is a generative model that learns a discrete latent representation by using vector quantization.

...read moreread less

Abstract: Learning useful representations without supervision remains a key challenge in machine learning. In this paper, we propose a simple yet powerful generative model that learns such discrete representations. Our model, the Vector Quantised-Variational AutoEncoder (VQ-VAE), differs from VAEs in two key ways: the encoder network outputs discrete, rather than continuous, codes; and the prior is learnt rather than static. In order to learn a discrete latent representation, we incorporate ideas from vector quantisation (VQ). Using the VQ method allows the model to circumvent issues of ``posterior collapse'' -— where the latents are ignored when they are paired with a powerful autoregressive decoder -— typically observed in the VAE framework. Pairing these representations with an autoregressive prior, the model can generate high quality images, videos, and speech as well as doing high quality speaker conversion and unsupervised learning of phonemes, providing further evidence of the utility of the learnt representations.

...read moreread less

1,963 citations

Journal Article•DOI•

An Introduction to Deep Learning for the Physical Layer

[...]

Timothy J. O'Shea¹, Jakob Hoydis²•Institutions (2)

Virginia Tech¹, Bell Labs²

02 Oct 2017-IEEE Transactions on Cognitive Communications and Networking

TL;DR: In this article, an end-to-end reconstruction task was proposed to jointly optimize transmitter and receiver components in a single process, which can be extended to networks of multiple transmitters and receivers.

...read moreread less

Abstract: We present and discuss several novel applications of deep learning for the physical layer. By interpreting a communications system as an autoencoder, we develop a fundamental new way to think about communications system design as an end-to-end reconstruction task that seeks to jointly optimize transmitter and receiver components in a single process. We show how this idea can be extended to networks of multiple transmitters and receivers and present the concept of radio transformer networks as a means to incorporate expert domain knowledge in the machine learning model. Lastly, we demonstrate the application of convolutional neural networks on raw IQ samples for modulation classification which achieves competitive accuracy with respect to traditional schemes relying on expert features. This paper is concluded with a discussion of open challenges and areas for future investigation.

...read moreread less

1,879 citations

Journal Article•DOI•

Low-Dose CT With a Residual Encoder-Decoder Convolutional Neural Network

[...]

Hu Chen¹, Yi Zhang¹, Mannudeep K. Kalra², Feng Lin¹, Yang Chen³, Peixi Liao, Jiliu Zhou¹, Ge Wang⁴ - Show less +4 more•Institutions (4)

Sichuan University¹, Harvard University², Southeast University³, Rensselaer Polytechnic Institute⁴

13 Jun 2017-IEEE Transactions on Medical Imaging

TL;DR: This work combines the autoencoder, deconvolution network, and shortcut connections into the residual encoder–decoder convolutional neural network (RED-CNN) for low-dose CT imaging and achieves a competitive performance relative to the-state-of-art methods in both simulated and clinical cases.

...read moreread less

Abstract: Given the potential risk of X-ray radiation to the patient, low-dose CT has attracted a considerable interest in the medical imaging field. Currently, the main stream low-dose CT methods include vendor-specific sinogram domain filtration and iterative reconstruction algorithms, but they need to access raw data, whose formats are not transparent to most users. Due to the difficulty of modeling the statistical characteristics in the image domain, the existing methods for directly processing reconstructed images cannot eliminate image noise very well while keeping structural details. Inspired by the idea of deep learning, here we combine the autoencoder, deconvolution network, and shortcut connections into the residual encoder–decoder convolutional neural network (RED-CNN) for low-dose CT imaging. After patch-based training, the proposed RED-CNN achieves a competitive performance relative to the-state-of-art methods in both simulated and clinical cases. Especially, our method has been favorably evaluated in terms of noise suppression, structural preservation, and lesion detection.

...read moreread less

1,161 citations

Proceedings Article•DOI•

Anomaly Detection with Robust Deep Autoencoders

[...]

Chong Zhou¹, Randy Paffenroth¹•Institutions (1)

Worcester Polytechnic Institute¹

04 Aug 2017

TL;DR: Novel extensions to deep autoencoders are demonstrated which not only maintain a deep autenkocoders' ability to discover high quality, non-linear features but can also eliminate outliers and noise without access to any clean training data.

...read moreread less

Abstract: Deep autoencoders, and other deep neural networks, have demonstrated their effectiveness in discovering non-linear features across many problem domains. However, in many real-world problems, large outliers and pervasive noise are commonplace, and one may not have access to clean training data as required by standard deep denoising autoencoders. Herein, we demonstrate novel extensions to deep autoencoders which not only maintain a deep autoencoders' ability to discover high quality, non-linear features but can also eliminate outliers and noise without access to any clean training data. Our model is inspired by Robust Principal Component Analysis, and we split the input data X into two parts, $X = L_{D} + S$, where $L_{D}$ can be effectively reconstructed by a deep autoencoder and $S$ contains the outliers and noise in the original data X. Since such splitting increases the robustness of standard deep autoencoders, we name our model a "Robust Deep Autoencoder (RDA)". Further, we present generalizations of our results to grouped sparsity norms which allow one to distinguish random anomalies from other types of structured corruptions, such as a collection of features being corrupted across many instances or a collection of instances having more corruptions than their fellows. Such "Group Robust Deep Autoencoders (GRDA)" give rise to novel anomaly detection approaches whose superior performance we demonstrate on a selection of benchmark problems.

...read moreread less

1,030 citations

Journal Article•DOI•

Learning Traffic as Images: A Deep Convolutional Neural Network for Large-Scale Transportation Network Speed Prediction.

[...]

Xiaolei Ma¹, Zhuang Dai¹, Zhengbing He², Jihui Ma², Yong Wang³, Yunpeng Wang¹ - Show less +2 more•Institutions (3)

Beihang University¹, Beijing Jiaotong University², Chongqing Jiaotong University³

10 Apr 2017-Sensors

TL;DR: Wang et al. as mentioned in this paper proposed a convolutional neural network (CNN)-based method that learns traffic as images and predicts large-scale, network-wide traffic speed with a high accuracy.

...read moreread less

Abstract: This paper proposes a convolutional neural network (CNN)-based method that learns traffic as images and predicts large-scale, network-wide traffic speed with a high accuracy. Spatiotemporal traffic dynamics are converted to images describing the time and space relations of traffic flow via a two-dimensional time-space matrix. A CNN is applied to the image following two consecutive steps: abstract traffic feature extraction and network-wide traffic speed prediction. The effectiveness of the proposed method is evaluated by taking two real-world transportation networks, the second ring road and north-east transportation network in Beijing, as examples, and comparing the method with four prevailing algorithms, namely, ordinary least squares, k-nearest neighbors, artificial neural network, and random forest, and three deep learning architectures, namely, stacked autoencoder, recurrent neural network, and long-short-term memory network. The results show that the proposed method outperforms other algorithms by an average accuracy improvement of 42.91% within an acceptable execution time. The CNN can train the model in a reasonable time and, thus, is suitable for large-scale transportation networks.

...read moreread less

894 citations

Journal Article•DOI•

Spectral-spatial classification of hyperspectral imagery with 3D convolutional neural network

[...]

Ying Li¹, Haokui Zhang¹, Qiang Shen•Institutions (1)

Northwestern Polytechnical University¹

13 Jan 2017-Remote Sensing

TL;DR: A 3D convolutional neural network framework is proposed for accurate HSI classification, which is lighter, less likely to over-fit, and easier to train, and requires fewer parameters than other deep learning-based methods.

...read moreread less

Abstract: Recent research has shown that using spectral–spatial information can considerably improve the performance of hyperspectral image (HSI) classification. HSI data is typically presented in the format of 3D cubes. Thus, 3D spatial filtering naturally offers a simple and effective method for simultaneously extracting the spectral–spatial features within such images. In this paper, a 3D convolutional neural network (3D-CNN) framework is proposed for accurate HSI classification. The proposed method views the HSI cube data altogether without relying on any preprocessing or post-processing, extracting the deep spectral–spatial-combined features effectively. In addition, it requires fewer parameters than other deep learning-based methods. Thus, the model is lighter, less likely to over-fit, and easier to train. For comparison and validation, we test the proposed method along with three other deep learning-based HSI classification methods—namely, stacked autoencoder (SAE), deep brief network (DBN), and 2D-CNN-based methods—on three real-world HSI datasets captured by different sensors. Experimental results demonstrate that our 3D-CNN-based method outperforms these state-of-the-art methods and sets a new record.

...read moreread less

835 citations

Posted Content•

Learning Traffic as Images: A Deep Convolutional Neural Network for Large-Scale Transportation Network Speed Prediction

[...]

Xiaolei Ma, Zhuang Dai, Zhengbing He, Jihui Na, Yong Wang, Yunpeng Wang - Show less +2 more

16 Jan 2017-arXiv: Learning

TL;DR: The CNN can train the model in a reasonable time and, thus, is suitable for large-scale transportation networks and outperforms other algorithms by an average accuracy improvement of 42.91% within an acceptable execution time.

...read moreread less

775 citations

Journal Article•DOI•

LLNet: A deep autoencoder approach to natural low-light image enhancement

[...]

Kin Gwn Lore¹, Adedotun Akintayo¹, Soumik Sarkar¹•Institutions (1)

Iowa State University¹

01 Jan 2017-Pattern Recognition

TL;DR: In this paper, a deep autoencoder-based approach is proposed to identify signal features from low-light images and adaptively brighten images without over-amplifying/saturating the lighter parts in images with high dynamic range.

...read moreread less

772 citations

Proceedings Article•DOI•

Age Progression/Regression by Conditional Adversarial Autoencoder

[...]

Zhifei Zhang¹, Yang Song¹, Hairong Qi¹•Institutions (1)

University of Tennessee¹

27 Feb 2017

TL;DR: In this article, a conditional adversarial autoencoder (CAAE) is proposed to learn a face manifold, traversing on which smooth age progression and regression can be realized simultaneously.

...read moreread less

Abstract: If I provide you a face image of mine (without telling you the actual age when I took the picture) and a large amount of face images that I crawled (containing labeled faces of different ages but not necessarily paired), can you show me what I would look like when I am 80 or what I was like when I was 5? The answer is probably a No. Most existing face aging works attempt to learn the transformation between age groups and thus would require the paired samples as well as the labeled query image. In this paper, we look at the problem from a generative modeling perspective such that no paired samples is required. In addition, given an unlabeled image, the generative model can directly produce the image with desired age attribute. We propose a conditional adversarial autoencoder (CAAE) that learns a face manifold, traversing on which smooth age progression and regression can be realized simultaneously. In CAAE, the face is first mapped to a latent vector through a convolutional encoder, and then the vector is projected to the face manifold conditional on age through a deconvolutional generator. The latent vector preserves personalized face features (i.e., personality) and the age condition controls progression vs. regression. Two adversarial networks are imposed on the encoder and generator, respectively, forcing to generate more photo-realistic faces. Experimental results demonstrate the appealing performance and flexibility of the proposed framework by comparing with the state-of-the-art and ground truth.

...read moreread less

766 citations

Proceedings Article•DOI•

Semantic Autoencoder for Zero-Shot Learning

[...]

Elyor Kodirov¹, Tao Xiang¹, Shaogang Gong¹•Institutions (1)

Queen Mary University of London¹

01 Jul 2017

TL;DR: In this paper, an encoder aims to project a visual feature vector into the semantic space as in the existing ZSL models, but the decoder exerts an additional constraint, that the projection/code must be able to reconstruct the original visual feature.

...read moreread less

Abstract: Existing zero-shot learning (ZSL) models typically learn a projection function from a feature space to a semantic embedding space (e.g. attribute space). However, such a projection function is only concerned with predicting the training seen class semantic representation (e.g. attribute prediction) or classification. When applied to test data, which in the context of ZSL contains different (unseen) classes without training data, a ZSL model typically suffers from the project domain shift problem. In this work, we present a novel solution to ZSL based on learning a Semantic AutoEncoder (SAE). Taking the encoder-decoder paradigm, an encoder aims to project a visual feature vector into the semantic space as in the existing ZSL models. However, the decoder exerts an additional constraint, that is, the projection/code must be able to reconstruct the original visual feature. We show that with this additional reconstruction constraint, the learned projection function from the seen classes is able to generalise better to the new unseen classes. Importantly, the encoder and decoder are linear and symmetric which enable us to develop an extremely efficient learning algorithm. Extensive experiments on six benchmark datasets demonstrate that the proposed SAE outperforms significantly the existing ZSL models with the additional benefit of lower computational cost. Furthermore, when the SAE is applied to supervised clustering problem, it also beats the state-of-the-art.

...read moreread less

Journal Article•DOI•

Multisensor Feature Fusion for Bearing Fault Diagnosis Using Sparse Autoencoder and Deep Belief Network

[...]

Zhuyun Chen¹, Weihua Li¹•Institutions (1)

South China University of Technology¹

20 Mar 2017-IEEE Transactions on Instrumentation and Measurement

TL;DR: Experimental results demonstrated that the proposed SAE-DBN approach can effectively identify the machine running conditions and significantly outperform other fusion methods.

...read moreread less

Abstract: To assess health conditions of rotating machinery efficiently, multiple accelerometers are mounted on different locations to acquire a variety of possible faults signals. The statistical features are extracted from these signals to identify the running status of a machine. However, the acquired vibration signals are different due to sensor’s arrangement and environmental interference, which may lead to different diagnostic results. In order to improve the fault diagnosis reliability, a new multisensor data fusion technique is proposed. First, time-domain and frequency-domain features are extracted from the different sensor signals, and then these features are input into multiple two-layer sparse autoencoder (SAE) neural networks for feature fusion. Finally, fused feature vectors can be regarded as the machine health indicators, and be used to train deep belief network (DBN) for further classification. To verify the effectiveness of the proposed SAE-DBN scheme, the bearing fault experiments were conducted on a bearing test platform, and the vibration data sets under different running speeds were collected for algorithm validation. For comparison, different feature fusion methods were also applied to multisensor fusion in the experiments. Experimental results demonstrated that the proposed approach can effectively identify the machine running conditions and significantly outperform other fusion methods.

...read moreread less

Proceedings Article•DOI•

Split-Brain Autoencoders: Unsupervised Learning by Cross-Channel Prediction

[...]

Richard Zhang¹, Phillip Isola¹, Alexei A. Efros¹•Institutions (1)

University of California, Berkeley¹

01 Jul 2017

TL;DR: Split-brain autoencoders as mentioned in this paper add a split to the network, resulting in two disjoint sub-networks, each sub-network is trained to perform a difficult task predicting one subset of the data channels from another.

...read moreread less

Abstract: We propose split-brain autoencoders, a straightforward modification of the traditional autoencoder architecture, for unsupervised representation learning. The method adds a split to the network, resulting in two disjoint sub-networks. Each sub-network is trained to perform a difficult task – predicting one subset of the data channels from another. Together, the sub-networks extract features from the entire input signal. By forcing the network to solve cross-channel prediction tasks, we induce a representation within the network which transfers well to other, unseen tasks. This method achieves state-of-the-art performance on several large-scale transfer learning benchmarks.

...read moreread less

Journal Article•DOI•

Deep learning networks for stock market analysis and prediction

[...]

Eunsuk Chong¹, Chulwoo Han², Frank C. Park¹•Institutions (2)

Seoul National University¹, Durham University²

15 Oct 2017-Expert Systems With Applications

TL;DR: A systematic analysis of the use of deep learning networks for stock market analysis and prediction using five-minute intraday data from the Korean KOSPI stock market as input data to examine the effects of three unsupervised feature extraction methods.

...read moreread less

Abstract: Deep learning networks are applied to stock market analysis and prediction.A comprehensive analysis with different data representation methods is offered.Five-minute intraday data from the Korean KOSPI stock market is used.The network applied to residuals of autoregressive model improves prediction.Covariance estimation for market structure analysis is improved with the network. We offer a systematic analysis of the use of deep learning networks for stock market analysis and prediction. Its ability to extract features from a large set of raw data without relying on prior knowledge of predictors makes deep learning potentially attractive for stock market prediction at high frequencies. Deep learning algorithms vary considerably in the choice of network structure, activation function, and other model parameters, and their performance is known to depend heavily on the method of data representation. Our study attempts to provides a comprehensive and objective assessment of both the advantages and drawbacks of deep learning algorithms for stock market analysis and prediction. Using high-frequency intraday stock returns as input data, we examine the effects of three unsupervised feature extraction methodsprincipal component analysis, autoencoder, and the restricted Boltzmann machineon the networks overall ability to predict future market behavior. Empirical results suggest that deep neural networks can extract additional information from the residuals of the autoregressive model and improve prediction performance; the same cannot be said when the autoregressive model is applied to the residuals of the network. Covariance estimation is also noticeably improved when the predictive network is applied to covariance-based market structure analysis. Our study offers practical insights and potentially useful directions for further investigation into how deep learning networks can be effectively used for stock market analysis and prediction.

...read moreread less

Journal Article•DOI•

Deep learning for wireless physical layer: Opportunities and challenges

[...]

Tianqi Wang¹, Chao-Kai Wen², Hanqing Wang¹, Feifei Gao³, Tao Jiang⁴, Shi Jin¹ - Show less +2 more•Institutions (4)

Southeast University¹, National Sun Yat-sen University², Tsinghua University³, Huazhong University of Science and Technology⁴

22 Dec 2017-China Communications

TL;DR: This paper presents a comprehensive overview of the emerging studies on DL-based physical layer processing, including leveraging DL to redesign a module of the conventional communication system and replace the communication system with a radically new architecture based on an autoencoder.

...read moreread less

Abstract: Machine learning (ML) has been widely applied to the upper layers of wireless communication systems for various purposes, such as deployment of cognitive radio and communication network. However, its application to the physical layer is hampered by sophisticated channel environments and limited learning ability of conventional ML algorithms. Deep learning (DL) has been recently applied for many fields, such as computer vision and natural language processing, given its expressive capacity and convenient optimization capability. The potential application of DL to the physical layer has also been increasingly recognized because of the new features for future communications, such as complex scenarios with unknown channel models, high speed and accurate processing requirements; these features challenge conventional communication theories. This paper presents a comprehensive overview of the emerging studies on DL-based physical layer processing, including leveraging DL to redesign a module of the conventional communication system (for modulation recognition, channel decoding, and detection) and replace the communication system with a radically new architecture based on an autoencoder. These DL-based methods show promising performance improvements but have certain limitations, such as lack of solid analytical tools and use of architectures that are specifically designed for communication and implementation research, thereby motivating future research in this field.

...read moreread less

Posted Content•

An Introduction to Deep Learning for the Physical Layer

[...]

Timothy J. O'Shea¹, Jakob Hoydis²•Institutions (2)

Virginia Tech¹, Bell Labs²

02 Feb 2017-arXiv: Information Theory

TL;DR: A fundamental new way to think about communications system design as an end-to-end reconstruction task that seeks to jointly optimize transmitter and receiver components in a single process is developed.

...read moreread less

Abstract: We present and discuss several novel applications of deep learning for the physical layer. By interpreting a communications system as an autoencoder, we develop a fundamental new way to think about communications system design as an end-to-end reconstruction task that seeks to jointly optimize transmitter and receiver components in a single process. We show how this idea can be extended to networks of multiple transmitters and receivers and present the concept of radio transformer networks as a means to incorporate expert domain knowledge in the machine learning model. Lastly, we demonstrate the application of convolutional neural networks on raw IQ samples for modulation classification which achieves competitive accuracy with respect to traditional schemes relying on expert features. The paper is concluded with a discussion of open challenges and areas for future investigation.

...read moreread less

Posted Content•

Compressed Sensing using Generative Models

[...]

Ashish Bora¹, Ajil Jalal¹, Eric Price¹, Alexandros G. Dimakis¹•Institutions (1)

University of Texas at Austin¹

09 Mar 2017-arXiv: Machine Learning

TL;DR: In this paper, the authors show that if the vectors lie near the range of a generative model, such as a variational autoencoder or generative adversarial network, then roughly O(k 2 ) random Gaussian measurements suffice for recovery.

...read moreread less

Abstract: The goal of compressed sensing is to estimate a vector from an underdetermined system of noisy linear measurements, by making use of prior knowledge on the structure of vectors in the relevant domain. For almost all results in this literature, the structure is represented by sparsity in a well-chosen basis. We show how to achieve guarantees similar to standard compressed sensing but without employing sparsity at all. Instead, we suppose that vectors lie near the range of a generative model $G: \mathbb{R}^k \to \mathbb{R}^n$. Our main theorem is that, if $G$ is $L$-Lipschitz, then roughly $O(k \log L)$ random Gaussian measurements suffice for an $\ell_2/\ell_2$ recovery guarantee. We demonstrate our results using generative models from published variational autoencoder and generative adversarial networks. Our method can use $5$-$10$x fewer measurements than Lasso for the same accuracy.

...read moreread less

Journal Article•DOI•

A novel deep autoencoder feature learning method for rotating machinery fault diagnosis

[...]

Haidong Shao¹, Hongkai Jiang¹, Huiwei Zhao¹, Fuan Wang¹•Institutions (1)

Northwestern Polytechnical University¹

01 Oct 2017-Mechanical Systems and Signal Processing

TL;DR: A novel deep autoencoder feature learning method is developed to diagnose rotating machinery fault and the results confirm that the proposed method is more effective and robust than other methods.

...read moreread less

Proceedings Article•DOI•

Unsupervised Video Summarization with Adversarial LSTM Networks

[...]

Behrooz Mahasseni¹, Michael Lam¹, Sinisa Todorovic¹•Institutions (1)

Oregon State University¹

01 Jul 2017

TL;DR: This paper addresses the problem of unsupervised video summarization, formulated as selecting a sparse subset of video frames that optimally represent the input video, with a novel generative adversarial framework.

...read moreread less

Abstract: This paper addresses the problem of unsupervised video summarization, formulated as selecting a sparse subset of video frames that optimally represent the input video. Our key idea is to learn a deep summarizer network to minimize distance between training videos and a distribution of their summarizations, in an unsupervised way. Such a summarizer can then be applied on a new video for estimating its optimal summarization. For learning, we specify a novel generative adversarial framework, consisting of the summarizer and discriminator. The summarizer is the autoencoder long short-term memory network (LSTM) aimed at, first, selecting video frames, and then decoding the obtained summarization for reconstructing the input video. The discriminator is another LSTM aimed at distinguishing between the original video and its reconstruction from the summarizer. The summarizer LSTM is cast as an adversary of the discriminator, i.e., trained so as to maximally confuse the discriminator. This learning is also regularized for sparsity. Evaluation on four benchmark datasets, consisting of videos showing diverse events in first-and third-person views, demonstrates our competitive performance in comparison to fully supervised state-of-the-art approaches.

...read moreread less

Posted Content•

Semantic Autoencoder for Zero-Shot Learning

[...]

Elyor Kodirov¹, Tao Xiang¹, Shaogang Gong¹•Institutions (1)

Queen Mary University of London¹

26 Apr 2017-arXiv: Computer Vision and Pattern Recognition

TL;DR: This work presents a novel solution to ZSL based on learning a Semantic AutoEncoder (SAE), which outperforms significantly the existing ZSL models with the additional benefit of lower computational cost and beats the state-of-the-art when the SAE is applied to supervised clustering problem.

...read moreread less

Abstract: Existing zero-shot learning (ZSL) models typically learn a projection function from a feature space to a semantic embedding space (e.g.~attribute space). However, such a projection function is only concerned with predicting the training seen class semantic representation (e.g.~attribute prediction) or classification. When applied to test data, which in the context of ZSL contains different (unseen) classes without training data, a ZSL model typically suffers from the project domain shift problem. In this work, we present a novel solution to ZSL based on learning a Semantic AutoEncoder (SAE). Taking the encoder-decoder paradigm, an encoder aims to project a visual feature vector into the semantic space as in the existing ZSL models. However, the decoder exerts an additional constraint, that is, the projection/code must be able to reconstruct the original visual feature. We show that with this additional reconstruction constraint, the learned projection function from the seen classes is able to generalise better to the new unseen classes. Importantly, the encoder and decoder are linear and symmetric which enable us to develop an extremely efficient learning algorithm. Extensive experiments on six benchmark datasets demonstrate that the proposed SAE outperforms significantly the existing ZSL models with the additional benefit of lower computational cost. Furthermore, when the SAE is applied to supervised clustering problem, it also beats the state-of-the-art.

...read moreread less

Posted Content•

Learning Representations and Generative Models for 3D Point Clouds

[...]

Panos Achlioptas¹, Olga Diamanti¹, Ioannis Mitliagkas¹, Leonidas J. Guibas¹•Institutions (1)

Stanford University¹

08 Jul 2017-arXiv: Computer Vision and Pattern Recognition

TL;DR: A deep AutoEncoder network with state-of-the-art reconstruction quality and generalization ability is introduced with results that outperform existing methods on 3D recognition tasks and enable shape editing via simple algebraic manipulations.

...read moreread less

Abstract: Three-dimensional geometric data offer an excellent domain for studying representation learning and generative modeling. In this paper, we look at geometric data represented as point clouds. We introduce a deep AutoEncoder (AE) network with state-of-the-art reconstruction quality and generalization ability. The learned representations outperform existing methods on 3D recognition tasks and enable shape editing via simple algebraic manipulations, such as semantic part editing, shape analogies and shape interpolation, as well as shape completion. We perform a thorough study of different generative models including GANs operating on the raw point clouds, significantly improved GANs trained in the fixed latent space of our AEs, and Gaussian Mixture Models (GMMs). To quantitatively evaluate generative models we introduce measures of sample fidelity and diversity based on matchings between sets of point clouds. Interestingly, our evaluation of generalization, fidelity and diversity reveals that GMMs trained in the latent space of our AEs yield the best results overall.

...read moreread less

Journal Article•DOI•

druGAN: An Advanced Generative Adversarial Autoencoder Model for de Novo Generation of New Molecules with Desired Molecular Properties in Silico

[...]

Artur Kadurin¹, Sergey I. Nikolenko², Kuzma Khrabrov³, Alexander Aliper¹, Alex Zhavoronkov¹, Alex Zhavoronkov⁴ - Show less +2 more•Institutions (4)

Johns Hopkins University¹, National Research University – Higher School of Economics², Mail.Ru Group³, Moscow Institute of Physics and Technology⁴

04 Aug 2017-Molecular Pharmaceutics

TL;DR: This work developed an advanced AAE model for molecular feature extraction problems, and demonstrated its advantages compared to VAE in terms of adjustability in generating molecular fingerprints; capacity of processing very large molecular data sets; and efficiency in unsupervised pretraining for regression model.

...read moreread less

Abstract: Deep generative adversarial networks (GANs) are the emerging technology in drug discovery and biomarker development. In our recent work, we demonstrated a proof-of-concept of implementing deep generative adversarial autoencoder (AAE) to identify new molecular fingerprints with predefined anticancer properties. Another popular generative model is the variational autoencoder (VAE), which is based on deep neural architectures. In this work, we developed an advanced AAE model for molecular feature extraction problems, and demonstrated its advantages compared to VAE in terms of (a) adjustability in generating molecular fingerprints; (b) capacity of processing very large molecular data sets; and (c) efficiency in unsupervised pretraining for regression model. Our results suggest that the proposed AAE model significantly enhances the capacity and efficiency of development of the new molecules with specific anticancer properties using the deep generative models.

...read moreread less

Journal Article•DOI•

GRASS: generative recursive autoencoders for shape structures

[...]

Jun Li¹, Kai Xu¹, Siddhartha Chaudhuri², Ersin Yumer³, Hao Zhang⁴, Leonidas J. Guibas⁵ - Show less +2 more•Institutions (5)

National University of Defense Technology¹, Indian Institute of Technology Bombay², Adobe Systems³, Simon Fraser University⁴, Stanford University⁵

20 Jul 2017-ACM Transactions on Graphics

TL;DR: A novel neural network architecture for encoding and synthesis of 3D shapes, particularly their structures, is introduced and it is demonstrated that without supervision, the network learns meaningful structural hierarchies adhering to perceptual grouping principles, produces compact codes which enable applications such as shape classification and partial matching, and supports shape synthesis and interpolation with significant variations in topology and geometry.

...read moreread less

Abstract: We introduce a novel neural network architecture for encoding and synthesis of 3D shapes, particularly their structures. Our key insight is that 3D shapes are effectively characterized by their hierarchical organization of parts, which reflects fundamental intra-shape relationships such as adjacency and symmetry. We develop a recursive neural net (RvNN) based autoencoder to map a flat, unlabeled, arbitrary part layout to a compact code. The code effectively captures hierarchical structures of man-made 3D objects of varying structural complexities despite being fixed-dimensional: an associated decoder maps a code back to a full hierarchy. The learned bidirectional mapping is further tuned using an adversarial setup to yield a generative model of plausible structures, from which novel structures can be sampled. Finally, our structure synthesis framework is augmented by a second trained module that produces fine-grained part geometry, conditioned on global and local structural context, leading to a full generative pipeline for 3D shapes. We demonstrate that without supervision, our network learns meaningful structural hierarchies adhering to perceptual grouping principles, produces compact codes which enable applications such as shape classification and partial matching, and supports shape synthesis and interpolation with significant variations in topology and geometry.

...read moreread less

Posted Content•

DESIRE: Distant Future Prediction in Dynamic Scenes with Interacting Agents

[...]

Namhoon Lee¹, Wongun Choi, Paul Vernaza, Christopher Choy², Philip H. S. Torr¹, Manmohan Chandraker³ - Show less +2 more•Institutions (3)

University of Oxford¹, Stanford University², University of California, San Diego³

14 Apr 2017-arXiv: Computer Vision and Pattern Recognition

TL;DR: In this paper, a Deep Stochastic IOC RNN Encoderdecoder framework, DESIRE, is proposed to predict future locations of objects in multiple scenes by accounting for the multi-modal nature of the future prediction (i.e., given the same context, future may vary).

...read moreread less

Abstract: We introduce a Deep Stochastic IOC RNN Encoderdecoder framework, DESIRE, for the task of future predictions of multiple interacting agents in dynamic scenes. DESIRE effectively predicts future locations of objects in multiple scenes by 1) accounting for the multi-modal nature of the future prediction (i.e., given the same context, future may vary), 2) foreseeing the potential future outcomes and make a strategic prediction based on that, and 3) reasoning not only from the past motion history, but also from the scene context as well as the interactions among the agents. DESIRE achieves these in a single end-to-end trainable neural network model, while being computationally efficient. The model first obtains a diverse set of hypothetical future prediction samples employing a conditional variational autoencoder, which are ranked and refined by the following RNN scoring-regression module. Samples are scored by accounting for accumulated future rewards, which enables better long-term strategic decisions similar to IOC frameworks. An RNN scene context fusion module jointly captures past motion histories, the semantic scene context and interactions among multiple agents. A feedback mechanism iterates over the ranking and refinement to further boost the prediction accuracy. We evaluate our model on two publicly available datasets: KITTI and Stanford Drone Dataset. Our experiments show that the proposed model significantly improves the prediction accuracy compared to other baseline methods.

...read moreread less

Posted Content•

Neural Audio Synthesis of Musical Notes with WaveNet Autoencoders

[...]

Jesse Engel, Cinjon Resnick, Adam Roberts, Sander Dieleman, Douglas Eck, Karen Simonyan, Mohammad Norouzi - Show less +3 more

05 Apr 2017-arXiv: Learning

TL;DR: A powerful new WaveNet-style autoencoder model is detailed that conditions an autoregressive decoder on temporal codes learned from the raw audio waveform, and NSynth, a large-scale and high-quality dataset of musical notes that is an order of magnitude larger than comparable public datasets is introduced.

...read moreread less

Abstract: Generative models in vision have seen rapid progress due to algorithmic improvements and the availability of high-quality image datasets. In this paper, we offer contributions in both these areas to enable similar progress in audio modeling. First, we detail a powerful new WaveNet-style autoencoder model that conditions an autoregressive decoder on temporal codes learned from the raw audio waveform. Second, we introduce NSynth, a large-scale and high-quality dataset of musical notes that is an order of magnitude larger than comparable public datasets. Using NSynth, we demonstrate improved qualitative and quantitative performance of the WaveNet autoencoder over a well-tuned spectral autoencoder baseline. Finally, we show that the model learns a manifold of embeddings that allows for morphing between instruments, meaningfully interpolating in timbre to create new types of sounds that are realistic and expressive.

...read moreread less

Journal Article•DOI•

Quantum autoencoders for efficient compression of quantum data

[...]

Jonathan Romero, Jonathan P. Olson, Alán Aspuru-Guzik

18 Aug 2017

TL;DR: The model of a quantum autoencoder is introduced to perform similar tasks on quantum data to compress ground states of the Hubbard model and molecular Hamiltonians and shows an example of a simple programmable circuit that can be trained as an efficient autoenCoder.

...read moreread less

Abstract: Classical autoencoders are neural networks that can learn efficient low-dimensional representations of data in higher-dimensional space. The task of an autoencoder is, given an input x, to map x to a lower dimensional point y such that x can likely be recovered from y. The structure of the underlying autoencoder network can be chosen to represent the data on a smaller dimension, effectively compressing the input. Inspired by this idea, we introduce the model of a quantum autoencoder to perform similar tasks on quantum data. The quantum autoencoder is trained to compress a particular data set of quantum states, where a classical compression algorithm cannot be employed. The parameters of the quantum autoencoder are trained using classical optimization algorithms. We show an example of a simple programmable circuit that can be trained as an efficient autoencoder. We apply our model in the context of quantum simulation to compress ground states of the Hubbard model and molecular Hamiltonians.

...read moreread less

Proceedings Article•DOI•

Collaborative Variational Autoencoder for Recommender Systems

[...]

Xiaopeng Li¹, James She¹•Institutions (1)

Hong Kong University of Science and Technology¹

04 Aug 2017

TL;DR: A Bayesian generative model called collaborative variational autoencoder (CVAE) that considers both rating and content for recommendation in multimedia scenario that is able to significantly outperform the state-of-the-art recommendation methods with more robust performance is proposed.

...read moreread less

Abstract: Modern recommender systems usually employ collaborative filtering with rating information to recommend items to users due to its successful performance. However, because of the drawbacks of collaborative-based methods such as sparsity, cold start, etc., more attention has been drawn to hybrid methods that consider both the rating and content information. Most of the previous works in this area cannot learn a good representation from content for recommendation task or consider only text modality of the content, thus their methods are very limited in current multimedia scenario. This paper proposes a Bayesian generative model called collaborative variational autoencoder (CVAE) that considers both rating and content for recommendation in multimedia scenario. The model learns deep latent representations from content data in an unsupervised manner and also learns implicit relationships between items and users from both content and rating. Unlike previous works with denoising criteria, the proposed CVAE learns a latent distribution for content in latent space instead of observation space through an inference network and can be easily extended to other multimedia modalities other than text. Experiments show that CVAE is able to significantly outperform the state-of-the-art recommendation methods with more robust performance.

...read moreread less

Posted Content•

Age Progression/Regression by Conditional Adversarial Autoencoder

[...]

Zhifei Zhang¹, Yang Song¹, Hairong Qi¹•Institutions (1)

University of Tennessee¹

27 Feb 2017-arXiv: Computer Vision and Pattern Recognition

TL;DR: A conditional adversarial autoencoder that learns a face manifold, traversing on which smooth age progression and regression can be realized simultaneously is proposed, and the appealing performance and flexibility of the proposed framework is demonstrated by comparing with the state-of-the-art and ground truth.

...read moreread less

Abstract: "If I provide you a face image of mine (without telling you the actual age when I took the picture) and a large amount of face images that I crawled (containing labeled faces of different ages but not necessarily paired), can you show me what I would look like when I am 80 or what I was like when I was 5?" The answer is probably a "No." Most existing face aging works attempt to learn the transformation between age groups and thus would require the paired samples as well as the labeled query image. In this paper, we look at the problem from a generative modeling perspective such that no paired samples is required. In addition, given an unlabeled image, the generative model can directly produce the image with desired age attribute. We propose a conditional adversarial autoencoder (CAAE) that learns a face manifold, traversing on which smooth age progression and regression can be realized simultaneously. In CAAE, the face is first mapped to a latent vector through a convolutional encoder, and then the vector is projected to the face manifold conditional on age through a deconvolutional generator. The latent vector preserves personalized face features (i.e., personality) and the age condition controls progression vs. regression. Two adversarial networks are imposed on the encoder and generator, respectively, forcing to generate more photo-realistic faces. Experimental results demonstrate the appealing performance and flexibility of the proposed framework by comparing with the state-of-the-art and ground truth.

...read moreread less

Proceedings Article•

Outlier Detection with Autoencoder Ensembles.

[...]

Jinghui Chen¹, Saket Sathe², Charu C. Aggarwal², Deepak S. Turaga²•Institutions (2)

University of Virginia¹, IBM²

01 Jan 2017

TL;DR: This paper shows that neural networks can be a very competitive technique to other existing methods in outlier detection and randomly vary on the connectivity architecture of the autoencoder to obtain significantly better performance.

...read moreread less

Abstract: In this paper, we introduce autoencoder ensembles for unsupervised outlier detection. One problem with neural networks is that they are sensitive to noise and often require large data sets to work robustly, while increasing data size makes them slow. As a result, there are only a few existing works in the literature on the use of neural networks in outlier detection. This paper shows that neural networks can be a very competitive technique to other existing methods. The basic idea is to randomly vary on the connectivity architecture of the autoencoder to obtain significantly better performance. Furthermore, we combine this technique with an adaptive sampling method to make our approach more efficient and effective. Experimental results comparing the proposed approach with state-of-theart detectors are presented on several benchmark data sets showing the accuracy of our approach.

...read moreread less

Proceedings Article•DOI•

Spatio-Temporal AutoEncoder for Video Anomaly Detection

[...]

Yiru Zhao¹, Bing Deng², Chen Shen³, Yao Liu², Hongtao Lu¹, Xian-Sheng Hua² - Show less +2 more•Institutions (3)

Shanghai Jiao Tong University¹, Alibaba Group², Zhejiang University³

23 Oct 2017

TL;DR: A novel model called Spatio-Temporal AutoEncoding (ST AutoEncoder or STAE), which utilizes deep neural networks to learn video representation automatically and extracts features from both spatial and temporal dimensions by performing 3-dimensional convolutions, which enhances the motion feature learning in videos.

...read moreread less

Abstract: Anomalous events detection in real-world video scenes is a challenging problem due to the complexity of "anomaly" as well as the cluttered backgrounds, objects and motions in the scenes. Most existing methods use hand-crafted features in local spatial regions to identify anomalies. In this paper, we propose a novel model called Spatio-Temporal AutoEncoder (ST AutoEncoder or STAE), which utilizes deep neural networks to learn video representation automatically and extracts features from both spatial and temporal dimensions by performing 3-dimensional convolutions. In addition to the reconstruction loss used in existing typical autoencoders, we introduce a weight-decreasing prediction loss for generating future frames, which enhances the motion feature learning in videos. Since most anomaly detection datasets are restricted to appearance anomalies or unnatural motion anomalies, we collected a new challenging dataset comprising a set of real-world traffic surveillance videos. Several experiments are performed on both the public benchmarks and our traffic dataset, which show that our proposed method remarkably outperforms the state-of-the-art approaches.

...read moreread less

Collapse