scispace - formally typeset
Search or ask a question

Showing papers on "Autoencoder published in 2014"


Proceedings ArticleDOI
02 Dec 2014
TL;DR: It is demonstrated that autoencoders are able to detect subtle anomalies which linear PCA fails and can be useful as nonlinear techniques without complex computation as kernel PCA requires.
Abstract: This paper proposes to use autoencoders with nonlinear dimensionality reduction in the anomaly detection task. The authors apply dimensionality reduction by using an autoencoder onto both artificial data and real data, and compare it with linear PCA and kernel PCA to clarify its property. The artificial data is generated from Lorenz system, and the real data is the spacecrafts' telemetry data. This paper demonstrates that autoencoders are able to detect subtle anomalies which linear PCA fails. Also, autoencoders can increase their accuracy by extending them to denoising autoenconders. Moreover, autoencoders can be useful as nonlinear techniques without complex computation as kernel PCA requires. Finaly, the authors examine the learned features in the hidden layer of autoencoders, and present that autoencoders learn the normal state properly and activate differently with anomalous input.

860 citations


Posted Content
TL;DR: Deep Contractive Network as mentioned in this paper proposes a new end-to-end training procedure that includes a smoothness penalty inspired by the contractive autoencoder (CAE), which increases the network robustness to adversarial examples, without a significant performance penalty.
Abstract: Recent work has shown deep neural networks (DNNs) to be highly susceptible to well-designed, small perturbations at the input layer, or so-called adversarial examples. Taking images as an example, such distortions are often imperceptible, but can result in 100% mis-classification for a state of the art DNN. We study the structure of adversarial examples and explore network topology, pre-processing and training strategies to improve the robustness of DNNs. We perform various experiments to assess the removability of adversarial examples by corrupting with additional noise and pre-processing with denoising autoencoders (DAEs). We find that DAEs can remove substantial amounts of the adversarial noise. How- ever, when stacking the DAE with the original DNN, the resulting network can again be attacked by new adversarial examples with even smaller distortion. As a solution, we propose Deep Contractive Network, a model with a new end-to-end training procedure that includes a smoothness penalty inspired by the contractive autoencoder (CAE). This increases the network robustness to adversarial examples, without a significant performance penalty.

632 citations


Proceedings Article
27 Jul 2014
TL;DR: This work proposes a simple method, which first learns a nonlinear embedding of the original graph by stacked autoencoder, and then runs k-means algorithm on the embedding to obtain clustering result, which significantly outperforms conventional spectral clustering.
Abstract: Recently deep learning has been successfully adopted in many applications such as speech recognition and image classification. In this work, we explore the possibility of employing deep learning in graph clustering. We propose a simple method, which first learns a nonlinear embedding of the original graph by stacked autoencoder, and then runs k-means algorithm on the embedding to obtain clustering result. We show that this simple method has solid theoretical foundation, due to the similarity between autoencoder and spectral clustering in terms of what they actually optimize. Then, we demonstrate that the proposed method is more efficient and flexible than spectral clustering. First, the computational complexity of autoencoder is much lower than spectral clustering: the former can be linear to the number of nodes in a sparse graph while the latter is super quadratic due to eigenvalue decomposition. Second, when additional sparsity constraint is imposed, we can simply employ the sparse autoencoder developed in the literature of deep learning; however, it is nonstraightforward to implement a sparse spectral method. The experimental results on various graph datasets show that the proposed method significantly outperforms conventional spectral clustering, which clearly indicates the effectiveness of deep learning in graph clustering.

596 citations


Proceedings ArticleDOI
03 Nov 2014
TL;DR: The problem of cross-modal retrieval, e.g., using a text query to search for images and vice-versa, is considered and a novel model involving correspondence autoencoder (Corr-AE) is proposed here for solving this problem, which is constructed by correlating hidden representations of two uni- modal autoencoders.
Abstract: The problem of cross-modal retrieval, e.g., using a text query to search for images and vice-versa, is considered in this paper. A novel model involving correspondence autoencoder (Corr-AE) is proposed here for solving this problem. The model is constructed by correlating hidden representations of two uni-modal autoencoders. A novel optimal objective, which minimizes a linear combination of representation learning errors for each modality and correlation learning error between hidden representations of two modalities, is used to train the model as a whole. Minimization of correlation learning error forces the model to learn hidden representations with only common information in different modalities, while minimization of representation learning error makes hidden representations are good enough to reconstruct input of each modality. A parameter $\alpha$ is used to balance the representation learning error and the correlation learning error. Based on two different multi-modal autoencoders, Corr-AE is extended to other two correspondence models, here we called Corr-Cross-AE and Corr-Full-AE. The proposed models are evaluated on three publicly available data sets from real scenes. We demonstrate that the three correspondence autoencoders perform significantly better than three canonical correlation analysis based models and two popular multi-modal deep models on cross-modal retrieval tasks.

558 citations


Journal ArticleDOI
TL;DR: A deep learning network (DLN) is proposed to discover unknown feature correlation between input signals that is crucial for the learning task and provides better performance compared to SVM and naive Bayes classifiers.
Abstract: Automatic emotion recognition is one of the most challenging tasks. To detect emotion from nonstationary EEG signals, a sophisticated learning algorithm that can represent high-level abstraction is required. This study proposes the utilization of a deep learning network (DLN) to discover unknown feature correlation between input signals that is crucial for the learning task. The DLN is implemented with a stacked autoencoder (SAE) using hierarchical feature learning approach. Input features of the network are power spectral densities of 32-channel EEG signals from 32 subjects. To alleviate overfitting problem, principal component analysis (PCA) is applied to extract the most important components of initial input features. Furthermore, covariate shift adaptation of the principal components is implemented to minimize the nonstationary effect of EEG signals. Experimental results show that the DLN is capable of classifying three different levels of valence and arousal with accuracy of 49.52% and 46.03%, respectively. Principal component based covariate shift adaptation enhances the respective classification accuracy by 5.55% and 6.53%. Moreover, DLN provides better performance compared to SVM and naive Bayes classifiers.

432 citations


Journal ArticleDOI
TL;DR: A training method that encodes each word into a different vector in semantic space and its relation to low entropy coding is presented and is applied to the stylish analyses of two Chinese novels.

390 citations


Posted Content
TL;DR: This work explores the use of autoencoder-based methods for cross-language learning of vectorial word representations that are coherent between two languages, while not relying on word-level alignments, and achieves state-of-the-art performance.
Abstract: Cross-language learning allows us to use training data from one language to build models for a different language. Many approaches to bilingual learning require that we have word-level alignment of sentences from parallel corpora. In this work we explore the use of autoencoder-based methods for cross-language learning of vectorial word representations that are aligned between two languages, while not relying on word-level alignments. We show that by simply learning to reconstruct the bag-of-words representations of aligned sentences, within and between languages, we can in fact learn high-quality representations and do without word alignments. Since training autoencoders on word observations presents certain computational issues, we propose and compare different variations adapted to this setting. We also propose an explicit correlation maximizing regularizer that leads to significant improvement in the performance. We empirically investigate the success of our approach on the problem of cross-language test classification, where a classifier trained on a given language (e.g., English) must learn to generalize to a different language (e.g., German). These experiments demonstrate that our approaches are competitive with the state-of-the-art, achieving up to 10-14 percentage point improvements over the best reported results on this task.

330 citations


Proceedings ArticleDOI
23 Jun 2014
TL;DR: A dimensionality reduction method by manifold learning, which iteratively explores data relation and use the relation to pursue the manifold structure, and a multilayer architecture of the generalized autoencoder called deep generalized aut Koencoder to handle highly complex datasets.
Abstract: The autoencoder algorithm and its deep version as traditional dimensionality reduction methods have achieved great success via the powerful representability of neural networks. However, they just use each instance to reconstruct itself and ignore to explicitly model the data relation so as to discover the underlying effective manifold structure. In this paper, we propose a dimensionality reduction method by manifold learning, which iteratively explores data relation and use the relation to pursue the manifold structure. The method is realized by a so called "generalized autoencoder" (GAE), which extends the traditional autoencoder in two aspects: (1) each instance xi is used to reconstruct a set of instances {xj} rather than itself. (2) The reconstruction error of each instance (| |xj--x'i| |2) is weighted by a relational function of xi and xj defined on the learned manifold. Hence, the GAE captures the structure of the data space through minimizing the weighted distances between reconstructed instances and the original ones. The generalized autoencoder provides a general neural network framework for dimensionality reduction. In addition, we propose a multilayer architecture of the generalized autoencoder called deep generalized autoencoder to handle highly complex datasets. Finally, to evaluate the proposed methods, we perform extensive experiments on three datasets. The experiments demonstrate that the proposed methods achieve promising performance.

320 citations


Journal ArticleDOI
TL;DR: An Adaptive Denoising Autoencoder based on an unsupervised domain adaptation method, where prior knowledge learned from a target set is used to regularize the training on a source set to achieve matched feature space representation for the target and source sets while ensuring target domain knowledge transfer.
Abstract: With the availability of speech data obtained from different devices and varied acquisition conditions, we are often faced with scenarios, where the intrinsic discrepancy between the training and the test data has an adverse impact on affective speech analysis. To address this issue, this letter introduces an Adaptive Denoising Autoencoder based on an unsupervised domain adaptation method, where prior knowledge learned from a target set is used to regularize the training on a source set. Our goal is to achieve a matched feature space representation for the target and source sets while ensuring target domain knowledge transfer. The method has been successfully evaluated on the 2009 INTERSPEECH Emotion Challenge’s FAU Aibo Emotion Corpus as target corpus and two other publicly available speech emotion corpora as sources. The experimental results show that our method significantly improves over the baseline performance and outperforms related feature domain adaptation methods.

253 citations


Proceedings Article
11 Dec 2014
TL;DR: Deep Contractive Network as mentioned in this paper proposes a new end-to-end training procedure that includes a smoothness penalty inspired by the contractive autoencoder (CAE), which increases the network robustness to adversarial examples, without a significant performance penalty.
Abstract: Recent work has shown deep neural networks (DNNs) to be highly susceptible to well-designed, small perturbations at the input layer, or so-called adversarial examples. Taking images as an example, such distortions are often imperceptible, but can result in 100% mis-classification for a state of the art DNN. We study the structure of adversarial examples and explore network topology, pre-processing and training strategies to improve the robustness of DNNs. We perform various experiments to assess the removability of adversarial examples by corrupting with additional noise and pre-processing with denoising autoencoders (DAEs). We find that DAEs can remove substantial amounts of the adversarial noise. How- ever, when stacking the DAE with the original DNN, the resulting network can again be attacked by new adversarial examples with even smaller distortion. As a solution, we propose Deep Contractive Network, a model with a new end-to-end training procedure that includes a smoothness penalty inspired by the contractive autoencoder (CAE). This increases the network robustness to adversarial examples, without a significant performance penalty.

227 citations


Proceedings Article
Karol Gregor1, Ivo Danihelka1, Andriy Mnih1, Charles Blundell1, Daan Wierstra1 
21 Jun 2014
TL;DR: In this paper, a deep, generative autoencoder capable of learning hierarchies of distributed representations from data is introduced, where successive deep stochastic hidden layers are equipped with autoregressive connections to enable the model to be sampled from quickly and exactly via ancestral sampling.
Abstract: We introduce a deep, generative autoencoder capable of learning hierarchies of distributed representations from data. Successive deep stochastic hidden layers are equipped with autoregressive connections, which enable the model to be sampled from quickly and exactly via ancestral sampling. We derive an efficient approximate parameter estimation method based on the minimum description length (MDL) principle, which can be seen as maximising a variational lower bound on the log-likelihood, with a feedforward neural network implementing approximate inference. We demonstrate state-of-the-art generative performance on a number of classic data sets, including several UCI data sets, MNIST and Atari 2600 games.

Book ChapterDOI
TL;DR: A network supporting deep unsupervised learning is presented that is an autoencoder with lateral shortcut connections from the encoder to the decoder at each level of the hierarchy, analogous to hierarchical latent variable models.
Abstract: A network supporting deep unsupervised learning is presented. The network is an autoencoder with lateral shortcut connections from the encoder to the decoder at each level of the hierarchy. The lateral shortcut connections allow the higher levels of the hierarchy to focus on abstract invariant features. Whereas autoencoders are analogous to latent variable models with a single layer of stochastic variables, the proposed network is analogous to hierarchical latent variable models. Learning combines denoising autoencoder and denoising sources separation frameworks. Each layer of the network contributes to the cost function a term which measures the distance of the representations produced by the encoder and the decoder. Since training signals originate from all levels of the network, all layers can learn efficiently even in deep networks. The speedup offered by cost terms from higher levels of the hierarchy and the ability to learn invariant features are demonstrated in experiments.

Proceedings Article
08 Dec 2014
TL;DR: This article explore the use of autoencoder-based methods for cross-language learning of vectorial word representations that are coherent between two languages, while not relying on word-level alignments.
Abstract: Cross-language learning allows one to use training data from one language to build models for a different language. Many approaches to bilingual learning require that we have word-level alignment of sentences from parallel corpora. In this work we explore the use of autoencoder-based methods for cross-language learning of vectorial word representations that are coherent between two languages, while not relying on word-level alignments. We show that by simply learning to reconstruct the bag-of-words representations of aligned sentences, within and between languages, we can in fact learn high-quality representations and do without word alignments. We empirically investigate the success of our approach on the problem of cross-language text classification, where a classifier trained on a given language (e.g., English) must learn to generalize to a different language (e.g., German). In experiments on 3 language pairs, we show that our approach achieves state-of-the-art performance, outperforming a method exploiting word alignments and a strong machine translation baseline.

Proceedings ArticleDOI
20 Sep 2014
TL;DR: With experiments on gene annotation data from the Gene Ontology project, it is shown that deep autoencoder networks achieve better performance than other standard machine learning methods, including the popular truncated singular value decomposition.
Abstract: The annotation of genomic information is a major challenge in biology and bioinformatics. Existing databases of known gene functions are incomplete and prone to errors, and the bimolecular experiments needed to improve these databases are slow and costly. While computational methods are not a substitute for experimental verification, they can help in two ways: algorithms can aid in the curation of gene annotations by automatically suggesting inaccuracies, and they can predict previously-unidentified gene functions, accelerating the rate of gene function discovery. In this work, we develop an algorithm that achieves both goals using deep autoencoder neural networks. With experiments on gene annotation data from the Gene Ontology project, we show that deep autoencoder networks achieve better performance than other standard machine learning methods, including the popular truncated singular value decomposition.

Journal ArticleDOI
TL;DR: A novel computational framework enabling the integration of sensory-motor time-series data and the self-organization of multimodal fused representations based on a deep learning approach is proposed.

Proceedings ArticleDOI
01 Nov 2014
TL;DR: Denoising autoencoders (DAs) as mentioned in this paper have been used to identify and extract complex patterns from genomic data, such as tumor or normal samples, estrogen receptor (ER) status, and molecular subtypes.
Abstract: Big data bring new opportunities for methods that efficiently summarize and automatically extract knowledge from such compendia. While both supervised learning algorithms and unsupervised clustering algorithms have been successfully applied to biological data, they are either dependent on known biology or limited to discerning the most significant signals in the data. Here we present denoising autoencoders (DAs), which employ a data-defined learning objective independent of known biology, as a method to identify and extract complex patterns from genomic data. We evaluate the performance of DAs by applying them to a large collection of breast cancer gene expression data. Results show that DAs successfully construct features that contain both clinical and molecular information. There are features that represent tumor or normal samples, estrogen receptor (ER) status, and molecular subtypes. Features constructed by the autoencoder generalize to an independent dataset collected using a distinct experimental platform. By integrating data from ENCODE for feature interpretation, we discover a feature representing ER status through association with key transcription factors in breast cancer. We also identify a feature highly predictive of patient survival and it is enriched by FOXM1 signaling pathway. The features constructed by DAs are often bimodally distributed with one peak near zero and another near one, which facilitates discretization. In summary, we demonstrate that DAs effectively extract key biological principles from gene expression data and summarize them into constructed features with convenient properties.

Proceedings ArticleDOI
12 Jul 2014
TL;DR: Experimental results indicate that this GA-assisted approach improves the performance of a deep autoencoder, producing a sparser neural network.
Abstract: In recent years, deep learning methods applying unsupervised learning to train deep layers of neural networks have achieved remarkable results in numerous fields In the past, many genetic algorithms based methods have been successfully applied to training neural networks In this paper, we extend previous work and propose a GA-assisted method for deep learning Our experimental results indicate that this GA-assisted approach improves the performance of a deep autoencoder, producing a sparser neural network

Proceedings Article
08 Dec 2014
TL;DR: This work shows how a bi-linear model of transformations, such as a gated autoencoder, can be turned into a recurrent network, by training it to predict future frames from the current one and the inferred transformation using backprop-through-time.
Abstract: We propose modeling time series by representing the transformations that take a frame at time t to a frame at time t+1. To this end we show how a bi-linear model of transformations, such as a gated autoencoder, can be turned into a recurrent network, by training it to predict future frames from the current one and the inferred transformation using backprop-through-time. We also show how stacking multiple layers of gating units in a recurrent pyramid makes it possible to represent the "syntax" of complicated time series, and that it can outperform standard recurrent neural networks in terms of prediction accuracy on a variety of tasks.

Posted Content
TL;DR: This work projects 3D shapes into 2D space and uses autoencoder for feature learning on the 2D images and shows the proposed deep learning feature is complementary to conventional local image descriptors, which can obtain the state-of-the-art performance on 3D shape retrieval benchmarks.
Abstract: We study the problem of how to build a deep learning representation for 3D shape. Deep learning has shown to be very effective in variety of visual applications, such as image classification and object detection. However, it has not been successfully applied to 3D shape recognition. This is because 3D shape has complex structure in 3D space and there are limited number of 3D shapes for feature learning. To address these problems, we project 3D shapes into 2D space and use autoencoder for feature learning on the 2D images. High accuracy 3D shape retrieval performance is obtained by aggregating the features learned on 2D images. In addition, we show the proposed deep learning feature is complementary to conventional local image descriptors. By combing the global deep learning representation and the local descriptor representation, our method can obtain the state-of-the-art performance on 3D shape retrieval benchmarks.

Proceedings ArticleDOI
01 Nov 2014
TL;DR: Experimental results on the Japanese Female Facial Expression database and extended Cohn-Kanade dataset outperform other methods and show the effectiveness and robustness of the face parsing components.
Abstract: This paper mainly studies facial expression recognition with the components by face parsing (FP). Considering the disadvantage that different parts of face contain different amount of information for facial expression and the weighted function are not the same for different faces, an idea is proposed to recognize facial expression using components which are active in expression disclosure. The face parsing detectors are trained via deep belief network and tuned by logistic regression. The detectors first detect face, and then detect nose, eyes and mouth hierarchically. A deep architecture pretrained with stacked autoencoder is applied to facial expression recognition with the concentrated features of detected components. The parsing components remove the redundant information in expression recognition, and images don't need to be aligned or any other artificial treatment. Experimental results on the Japanese Female Facial Expression database and extended Cohn-Kanade dataset outperform other methods and show the effectiveness and robustness of this algorithm.

Proceedings ArticleDOI
01 Apr 2014
TL;DR: This study trained a deep autoencoder to build compact representations of short-term spectra of multiple speakers, using this compact representation as mapping features, and trained an artificial neural network to predict target voice features from source voice features.
Abstract: In this study, we trained a deep autoencoder to build compact representations of short-term spectra of multiple speakers. Using this compact representation as mapping features, we then trained an artificial neural network to predict target voice features from source voice features. Finally, we constructed a deep neural network from the trained deep autoencoder and artificial neural network weights, which were then fine-tuned using back-propagation. We compared the proposed method to existing methods using Gaussian mixture models and frame-selection. We evaluated the methods objectively, and also conducted perceptual experiments to measure both the conversion accuracy and speech quality of selected systems. The results showed that, for 70 training sentences, frame-selection performed best, regarding both accuracy and quality. When using only two training sentences, the pre-trained deep neural network performed best, regarding both accuracy and quality.

Book ChapterDOI
03 Nov 2014
TL;DR: This research proposed an autoencoder based collaborative filtering method, in which pretraining and stacking mechanism is provided, which has shown its potential and effectiveness in getting higher recall.
Abstract: Currently collaborative filtering is widely used in recommender systems. With the development of idea of deep learning, a lot of researches have been conducted to improve collaborative filtering by integrating deep learning techniques. In this research, we proposed an autoencoder based collaborative filtering method, in which pretraining and stacking mechanism is provided. The experimental study on commonly used MovieLens datasets have shown its potential and effectiveness in getting higher recall.

Proceedings Article
01 Jan 2014
TL;DR: In this paper, an autoencoder with linear activation function is proposed, where in hidden layers only the k highest activities are kept, which achieves better classification results than denoising autoencoders, networks trained with dropout, and RBMs.
Abstract: Recently, it has been observed that when representations are learnt in a way that encourages sparsity, improved performance is obtained on classification tasks. These methods involve combinations of activation functions, sampling steps and different kinds of penalties. To investigate the effectiveness of sparsity by itself, we propose the k-sparse autoencoder, which is an autoencoder with linear activation function, where in hidden layers only the k highest activities are kept. When applied to the MNIST and NORB datasets, we find that this method achieves better classification results than denoising autoencoders, networks trained with dropout, and RBMs. k-sparse autoencoders are simple to train and the encoding stage is very fast, making them well-suited to large problem sizes, where conventional sparse coding algorithms cannot be applied.

Proceedings ArticleDOI
03 Nov 2014
TL;DR: A novel attribute discovery approach that can automatically identify, model and name attributes from an arbitrary set of image and text pairs that can be easily gathered on the Web, and is able to build a large visual knowledge base without any human efforts.
Abstract: Higher-level semantics such as visual attributes are crucial for fundamental multimedia applications. We present a novel attribute discovery approach that can automatically identify, model and name attributes from an arbitrary set of image and text pairs that can be easily gathered on the Web. Different from conventional attribute discovery methods, our approach does not rely on any pre-defined vocabularies and human labeling. Therefore, we are able to build a large visual knowledge base without any human efforts. The discovery is based on a novel deep architecture, named Independent Component Multimodal Autoencoder (ICMAE), that can continually learn shared higher-level representations across the visual and textual modalities. With the help of the resultant representations encoding strong visual and semantic evidences, we propose to (a) identify attributes and their corresponding high-quality training images, (b) iteratively model them with maximum compactness and comprehensiveness, and (c) name the attribute models with human understandable words. To date, the proposed system has discovered 1,898 attributes over 1.3 million pairs of image and text. Extensive experiments on various real-world multimedia datasets demonstrate the quality and effectiveness of the discovered attributes, facilitating multimedia applications such as image annotation and retrieval as compared to the state-of-the-art approaches.

Proceedings ArticleDOI
04 May 2014
TL;DR: A robust stacked autoencoder based on maximum correntropy criterion (MCC) to deal with the data containing non-Gaussian noises and outliers is proposed and Experimental results show that R-SAE is capable of learning robust features on noisy data.
Abstract: Unsupervised feature learning with deep networks has been widely studied in the recent years Despite the progress, most existing models would be fragile to non-Gaussian noises and outliers due to the criterion of mean square error (MSE) In this paper, we propose a robust stacked autoencoder (R-SAE) based on maximum correntropy criterion (MCC) to deal with the data containing non-Gaussian noises and outliers By replacing MSE with MCC, the anti-noise ability of stacked autoencoder is improved The proposed method is evaluated using the MNIST benchmark dataset Experimental results show that, compared with the ordinary stacked autoencoder, the R-SAE improves classification accuracy by 14% and reduces the reconstruction error by 39%, which demonstrates that R-SAE is capable of learning robust features on noisy data

Proceedings ArticleDOI
31 Jul 2014
TL;DR: The proposed Stacked Sparse Autoencoder (SSAE) based framework for nuclei classification on breast cancer histopathology yields an accuracy of 83.7%, F1 score of 82%, and AUC of 0.8992, which outperform Softmax classifier, PCA+Soft max, and SAE+Softmax.
Abstract: In this paper, a Stacked Sparse Autoencoder (SSAE) based framework is presented for nuclei classification on breast cancer histopathology. SSAE works very well in learning useful high-level feature for better representation of input raw data. To show the effectiveness of proposed framework, SSAE+Softmax is compared with conventional Softmax classifier, PCA+Softmax, and single layer Sparse Autoencoder (SAE)+Softmax in classifying the nuclei and non-nuclei patches extracted from breast cancer histopathology. The SSAE+Softmax for nuclei patch classification yields an accuracy of 83.7%, F1 score of 82%, and AUC of 0.8992, which outperform Softmax classifier, PCA+Softmax, and SAE+Softmax.

Proceedings ArticleDOI
04 May 2014
TL;DR: The experimental results show that the SHLA method significantly improves over the baseline performance and outperforms today's state-of-the-art domain adaptation methods.
Abstract: This study addresses a situation in practice where training and test samples come from different corpora - here in acoustic emotion recognition. In this situation, a model is trained on one database while tested on another disjoint one. The typical inherent mismatch between the corpora and by that between test and training set usually leads to significant performance degradation. To cope with this problem when no training data from the target domain exists, we propose a `shared-hidden-layer autoencoder' (SHLA) approach for learning common feature representations shared across the training and test set in order to reduce the discrepancy in them. To exemplify effectiveness of our approach, we select the Interspeech Emotion Challenge's FAU Aibo Emotion Corpus as test database and two other publicly available databases as training set for extensive evaluation. The experimental results show that our SHLA method significantly improves over the baseline performance and outperforms today's state-of-the-art domain adaptation methods.

Proceedings ArticleDOI
Kyunghyun Cho1, Xi Chen1
01 Jan 2014
TL;DR: This paper proposes a novel system to recognize the actions from skeleton data with simple, but effective, features using deep neural networks, which achieves an accuracy above 95% which is, to the knowledge, the state of the art result for such a large dataset.
Abstract: The gesture recognition using motion capture data and depth sensors has recently drawn more attention in vision recognition. Currently most systems only classify dataset with a couple of dozens different actions. Moreover, feature extraction from the data is often computational complex. In this paper, we propose a novel system to recognize the actions from skeleton data with simple, but effective, features using deep neural networks. Features are extracted for each frame based on the relative positions of joints (PO), temporal differences (TD), and normalized trajectories of motion (NT). Given these features a hybrid multi-layer perceptron is trained, which simultaneously classifies and reconstructs input data. We use deep autoencoder to visualize learnt features. The experiments show that deep neural networks can capture more discriminative information than, for instance, principal component analysis can. We test our system on a public database with 65 classes and more than 2,000 motion sequences. We obtain an accuracy above 95% which is, to our knowledge, the state of the art result for such a large dataset.

Proceedings Article
21 Jun 2014
TL;DR: Nested dropout, a procedure for stochastically removing coherent nested sets of hidden units in a neural network, is introduced and it is rigorously shown that the application of nested dropout enforces identifiability of the units, which leads to an exact equivalence with PCA.
Abstract: In this paper, we present results on ordered representations of data in which different dimensions have different degrees of importance. To learn these representations we introduce nested dropout, a procedure for stochastically removing coherent nested sets of hidden units in a neural network. We first present a sequence of theoretical results for the special case of a semilinear autoencoder. We rigorously show that the application of nested dropout enforces identifiability of the units, which leads to an exact equivalence with PCA. We then extend the algorithm to deep models and demonstrate the relevance of ordered representations to a number of applications. Specifically, we use the ordered property of the learned codes to construct hash-based data structures that permit very fast retrieval, achieving retrieval in time logarithmic in the database size and independent of the dimensionality of the representation. This allows codes that are hundreds of times longer than currently feasible for retrieval. We therefore avoid the diminished quality associated with short codes, while still performing retrieval that is competitive in speed with existing methods. We also show that ordered representations are a promising way to learn adaptive compression for efficient online data reconstruction.

Posted Content
TL;DR: This work shows that negative biases are a natural result of using a hidden layer whose responsibility is to both represent the input data and act as a selection mechanism that ensures sparsity of the representation and proposes a new activation function that decouples the two roles of the hidden layer.
Abstract: Regularized training of an autoencoder typically results in hidden unit biases that take on large negative values. We show that negative biases are a natural result of using a hidden layer whose responsibility is to both represent the input data and act as a selection mechanism that ensures sparsity of the representation. We then show that negative biases impede the learning of data distributions whose intrinsic dimensionality is high. We also propose a new activation function that decouples the two roles of the hidden layer and that allows us to learn representations on data with very high intrinsic dimensionality, where standard autoencoders typically fail. Since the decoupled activation function acts like an implicit regularizer, the model can be trained by minimizing the reconstruction error of training data, without requiring any additional regularization.