Showing papers on "Autoencoder published in 2010"

PDF

Open Access

Proceedings Article•

Binary Coding of Speech Spectrograms Using a Deep Auto-encoder

[...]

Li Deng¹, Michael L. Seltzer¹, Dong Yu¹, Alex Acero¹, Abdelrahman Mohamed², Geoffrey E. Hinton² - Show less +2 more•Institutions (2)

Microsoft¹, University of Toronto²

01 Sep 2010

TL;DR: This paper reports the recent exploration of the layer-by-layer learning strategy for training a multi-layer generative model of patches of speech spectrograms and shows that the binary codes learned produce a logspectral distortion that is approximately 2 dB lower than a subband vector quantization technique over the entire frequency range of wide-band speech.

...read moreread less

Abstract: This paper reports our recent exploration of the layer-by-layer learning strategy for training a multi-layer generative model of patches of speech spectrograms. The top layer of the generative model learns binary codes that can be used for efficient compression of speech and could also be used for scalable speech recognition or rapid speech content retrieval. Each layer of the generative model is fully connected to the layer below and the weights on these connections are pretrained efficiently by using the contrastive divergence approximation to the log likelihood gradient. After layer-bylayer pre-training we “unroll” the generative model to form a deep auto-encoder, whose parameters are then fine-tuned using back-propagation. To reconstruct the full-length speech spectrogram, individual spectrogram segments predicted by their respective binary codes are combined using an overlapand-add method. Experimental results on speech spectrogram coding demonstrate that the binary codes produce a logspectral distortion that is approximately 2 dB lower than a subband vector quantization technique over the entire frequency range of wide-band speech. Index Terms: deep learning, speech feature extraction, neural networks, auto-encoder, binary codes, Boltzmann machine

...read moreread less

372 citations

Proceedings Article•DOI•

Deep auto-encoder neural networks in reinforcement learning

[...]

Sascha Lange¹, Martin Riedmiller¹•Institutions (1)

University of Freiburg¹

18 Jul 2010

TL;DR: A framework for combining the training of deep auto-encoders (for learning compact feature spaces) with recently-proposed batch-mode RL algorithms ( for learning policies) is proposed and an emphasis is put on the data-efficiency and on studying the properties of the feature spaces automatically constructed by the deep Auto-encoder neural networks.

...read moreread less

Abstract: This paper discusses the effectiveness of deep auto-encoder neural networks in visual reinforcement learning (RL) tasks. We propose a framework for combining the training of deep auto-encoders (for learning compact feature spaces) with recently-proposed batch-mode RL algorithms (for learning policies). An emphasis is put on the data-efficiency of this combination and on studying the properties of the feature spaces automatically constructed by the deep auto-encoders. These feature spaces are empirically shown to adequately resemble existing similarities and spatial relations between observations and allow to learn useful policies. We propose several methods for improving the topology of the feature spaces making use of task-dependent information. Finally, we present first results on successfully learning good control policies directly on synthesized and real images.

...read moreread less

353 citations

Journal Article•DOI•

Reconstruction and recognition of face and digit images using autoencoders

[...]

Chun Chet Tan¹, Chikkannan Eswaran¹•Institutions (1)

Multimedia University¹

01 Oct 2010-Neural Computing and Applications

TL;DR: This paper presents techniques for image reconstruction and recognition using autoencoders, and instead of whole images, image patches are used for training, and this leads to much simpler autoencoder architectures and reduced training time.

...read moreread less

Abstract: This paper presents techniques for image reconstruction and recognition using autoencoders. Experiments are conducted to compare the performances of three types of autoencoder neural networks based on their efficiency of reconstruction and recognition. Reconstruction error and recognition rate are determined in all the three cases using the same architecture configuration and training algorithm. The results obtained with autoencoders are also compared with those obtained using principal component analysis method. Instead of whole images, image patches are used for training, and this leads to much simpler autoencoder architectures and reduced training time.

...read moreread less

24 citations

Proceedings Article•DOI•

An autoencoder neural-network based low-dimensionality approach to excitation modeling for HMM-based text-to-speech

[...]

Srikanth Vishnubhotla¹, Raul Fernandez², Bhuvana Ramabhadran²•Institutions (2)

University of Maryland, College Park¹, IBM²

14 Mar 2010

TL;DR: This paper proposes a novel method for modeling the excitation as a low-dimensional set of coefficients, based on a non-linear map learned through an autoencoder, and shows that this model produces speech of higher perceptual quality compared to conventional pulse-excited speech signals at the p ≪ 0.01 significance level.

...read moreread less

Abstract: HMM-TTS synthesis is a popular approach toward flexible, low-footprint, data driven systems that produce highly intelligible speech. In spite of these strengths, speech generated by these systems exhibit some degradation in quality, attributable to an inadequacy in modeling the excitation signal that drives the parametric models of the vocal tract. This paper proposes a novel method for modeling the excitation as a low-dimensional set of coefficients, based on a non-linear map learned through an autoencoder. Through analysis-and-resynthesis experiments, and a formal listening test, we show that this model produces speech of higher perceptual quality compared to conventional pulse-excited speech signals at the p ≪ 0.01 significance level.

...read moreread less

20 citations

Proceedings Article•

Efficient online learning of a non-negative sparse autoencoder

[...]

Andre Lemme¹, R. Felix Reinhart¹, Jochen J. Steil¹•Institutions (1)

Bielefeld University¹

01 Jan 2010

TL;DR: It is shown that the efficient autoencoder yields to better sparseness and lower reconstruction errors than the batch algorithms on the MNIST benchmark dataset.

...read moreread less

Abstract: We introduce an efficient online learning mechanism for non- negative sparse coding in autoencoder neural networks. In this paper we compare the novel method to the batch algorithm non-negative matrix factorization with and without sparseness constraint. We show that the efficient autoencoder yields to better sparseness and lower reconstruction errors than the batch algorithms on the MNIST benchmark dataset.

...read moreread less

19 citations

Book Chapter•DOI•

Deep bottleneck classifiers in supervised dimension reduction

[...]

Elina Parviainen¹•Institutions (1)

Aalto University¹

15 Sep 2010

TL;DR: This work proposes using a deep bottlenecked neural network in supervised dimension reduction, instead of trying to reproduce the data, the network is trained to perform classification.

...read moreread less

Abstract: Deep autoencoder networks have successfully been applied in unsupervised dimension reduction. The autoencoder has a "bottleneck" middle layer of only a few hidden units, which gives a low dimensional representation for the data when the full network is trained to minimize reconstruction error. We propose using a deep bottlenecked neural network in supervised dimension reduction. Instead of trying to reproduce the data, the network is trained to perform classification. Pretraining with restricted Boltzmann machines is combined with supervised finetuning. Finetuning with supervised cost functions has been done, but with cost functions that scale quadratically. Training a bottleneck classifier scales linearly, but still gives results comparable to or sometimes better than two earlier supervised methods.

...read moreread less

7 citations