scispace - formally typeset
Search or ask a question

Showing papers on "Autoencoder published in 2010"


Proceedings Article
01 Sep 2010
TL;DR: This paper reports the recent exploration of the layer-by-layer learning strategy for training a multi-layer generative model of patches of speech spectrograms and shows that the binary codes learned produce a logspectral distortion that is approximately 2 dB lower than a subband vector quantization technique over the entire frequency range of wide-band speech.
Abstract: This paper reports our recent exploration of the layer-by-layer learning strategy for training a multi-layer generative model of patches of speech spectrograms. The top layer of the generative model learns binary codes that can be used for efficient compression of speech and could also be used for scalable speech recognition or rapid speech content retrieval. Each layer of the generative model is fully connected to the layer below and the weights on these connections are pretrained efficiently by using the contrastive divergence approximation to the log likelihood gradient. After layer-bylayer pre-training we “unroll” the generative model to form a deep auto-encoder, whose parameters are then fine-tuned using back-propagation. To reconstruct the full-length speech spectrogram, individual spectrogram segments predicted by their respective binary codes are combined using an overlapand-add method. Experimental results on speech spectrogram coding demonstrate that the binary codes produce a logspectral distortion that is approximately 2 dB lower than a subband vector quantization technique over the entire frequency range of wide-band speech. Index Terms: deep learning, speech feature extraction, neural networks, auto-encoder, binary codes, Boltzmann machine

372 citations


Proceedings ArticleDOI
18 Jul 2010
TL;DR: A framework for combining the training of deep auto-encoders (for learning compact feature spaces) with recently-proposed batch-mode RL algorithms ( for learning policies) is proposed and an emphasis is put on the data-efficiency and on studying the properties of the feature spaces automatically constructed by the deep Auto-encoder neural networks.
Abstract: This paper discusses the effectiveness of deep auto-encoder neural networks in visual reinforcement learning (RL) tasks. We propose a framework for combining the training of deep auto-encoders (for learning compact feature spaces) with recently-proposed batch-mode RL algorithms (for learning policies). An emphasis is put on the data-efficiency of this combination and on studying the properties of the feature spaces automatically constructed by the deep auto-encoders. These feature spaces are empirically shown to adequately resemble existing similarities and spatial relations between observations and allow to learn useful policies. We propose several methods for improving the topology of the feature spaces making use of task-dependent information. Finally, we present first results on successfully learning good control policies directly on synthesized and real images.

353 citations


Journal ArticleDOI
TL;DR: This paper presents techniques for image reconstruction and recognition using autoencoders, and instead of whole images, image patches are used for training, and this leads to much simpler autoencoder architectures and reduced training time.
Abstract: This paper presents techniques for image reconstruction and recognition using autoencoders. Experiments are conducted to compare the performances of three types of autoencoder neural networks based on their efficiency of reconstruction and recognition. Reconstruction error and recognition rate are determined in all the three cases using the same architecture configuration and training algorithm. The results obtained with autoencoders are also compared with those obtained using principal component analysis method. Instead of whole images, image patches are used for training, and this leads to much simpler autoencoder architectures and reduced training time.

24 citations


Proceedings ArticleDOI
14 Mar 2010
TL;DR: This paper proposes a novel method for modeling the excitation as a low-dimensional set of coefficients, based on a non-linear map learned through an autoencoder, and shows that this model produces speech of higher perceptual quality compared to conventional pulse-excited speech signals at the p ≪ 0.01 significance level.
Abstract: HMM-TTS synthesis is a popular approach toward flexible, low-footprint, data driven systems that produce highly intelligible speech. In spite of these strengths, speech generated by these systems exhibit some degradation in quality, attributable to an inadequacy in modeling the excitation signal that drives the parametric models of the vocal tract. This paper proposes a novel method for modeling the excitation as a low-dimensional set of coefficients, based on a non-linear map learned through an autoencoder. Through analysis-and-resynthesis experiments, and a formal listening test, we show that this model produces speech of higher perceptual quality compared to conventional pulse-excited speech signals at the p ≪ 0.01 significance level.

20 citations


Proceedings Article
01 Jan 2010
TL;DR: It is shown that the efficient autoencoder yields to better sparseness and lower reconstruction errors than the batch algorithms on the MNIST benchmark dataset.
Abstract: We introduce an efficient online learning mechanism for non- negative sparse coding in autoencoder neural networks. In this paper we compare the novel method to the batch algorithm non-negative matrix factorization with and without sparseness constraint. We show that the efficient autoencoder yields to better sparseness and lower reconstruction errors than the batch algorithms on the MNIST benchmark dataset.

19 citations


Book ChapterDOI
15 Sep 2010
TL;DR: This work proposes using a deep bottlenecked neural network in supervised dimension reduction, instead of trying to reproduce the data, the network is trained to perform classification.
Abstract: Deep autoencoder networks have successfully been applied in unsupervised dimension reduction. The autoencoder has a "bottleneck" middle layer of only a few hidden units, which gives a low dimensional representation for the data when the full network is trained to minimize reconstruction error. We propose using a deep bottlenecked neural network in supervised dimension reduction. Instead of trying to reproduce the data, the network is trained to perform classification. Pretraining with restricted Boltzmann machines is combined with supervised finetuning. Finetuning with supervised cost functions has been done, but with cost functions that scale quadratically. Training a bottleneck classifier scales linearly, but still gives results comparable to or sometimes better than two earlier supervised methods.

7 citations