Showing papers on "Deep learning published in 2011"

PDF

Open Access

Proceedings Article•

[...]

Jiquan Ngiam¹, Aditya Khosla¹, Mingyu Kim¹, Juhan Nam¹, Honglak Lee², Andrew Y. Ng¹ - Show less +2 more•Institutions (2)

Stanford University¹, University of Michigan²

28 Jun 2011

TL;DR: This work presents a series of tasks for multimodal learning and shows how to train deep networks that learn features to address these tasks, and demonstrates cross modality feature learning, where better features for one modality can be learned if multiple modalities are present at feature learning time.

...read moreread less

Abstract: Deep networks have been successfully applied to unsupervised feature learning for single modalities (e.g., text, images or audio). In this work, we propose a novel application of deep networks to learn features over multiple modalities. We present a series of tasks for multimodal learning and show how to train deep networks that learn features to address these tasks. In particular, we demonstrate cross modality feature learning, where better features for one modality (e.g., video) can be learned if multiple modalities (e.g., audio and video) are present at feature learning time. Furthermore, we show how to learn a shared representation between modalities and evaluate it on a unique task, where the classifier is trained with audio-only data but tested with video-only data and vice-versa. Our models are validated on the CUAVE and AVLetters datasets on audio-visual speech classification, demonstrating best published visual speech classification on AVLetters and effective shared representation learning.

...read moreread less

2,830 citations

Book Chapter•DOI•

Stacked convolutional auto-encoders for hierarchical feature extraction

[...]

Jonathan Masci, Ueli Meier, Dan Ciresan, Jürgen Schmidhuber

14 Jun 2011

TL;DR: A novel convolutional auto-encoder (CAE) for unsupervised feature learning that initializing a CNN with filters of a trained CAE stack yields superior performance on a digit and an object recognition benchmark.

...read moreread less

Abstract: We present a novel convolutional auto-encoder (CAE) for unsupervised feature learning. A stack of CAEs forms a convolutional neural network (CNN). Each CAE is trained using conventional on-line gradient descent without additional regularization terms. A max-pooling layer is essential to learn biologically plausible features consistent with those found by previous approaches. Initializing a CNN with filters of a trained CAE stack yields superior performance on a digit (MNIST) and an object recognition (CIFAR10) benchmark.

...read moreread less

1,832 citations

Proceedings Article•

Domain Adaptation for Large-Scale Sentiment Classification: A Deep Learning Approach

[...]

Xavier Glorot¹, Antoine Bordes¹, Antoine Bordes², Yoshua Bengio¹•Institutions (2)

Université de Montréal¹, University of Technology of Compiègne²

28 Jun 2011

TL;DR: A deep learning approach is proposed which learns to extract a meaningful representation for each review in an unsupervised fashion and clearly outperform state-of-the-art methods on a benchmark composed of reviews of 4 types of Amazon products.

...read moreread less

Abstract: The exponential increase in the availability of online reviews and recommendations makes sentiment classification an interesting topic in academic and industrial research. Reviews can span so many different domains that it is difficult to gather annotated training data for all of them. Hence, this paper studies the problem of domain adaptation for sentiment classifiers, hereby a system is trained on labeled reviews from one source domain but is meant to be deployed on another. We propose a deep learning approach which learns to extract a meaningful representation for each review in an unsupervised fashion. Sentiment classifiers trained with this high-level feature representation clearly outperform state-of-the-art methods on a benchmark composed of reviews of 4 types of Amazon products. Furthermore, this method scales well and allowed us to successfully perform domain adaptation on a larger industrial-strength dataset of 22 domains.

...read moreread less

1,769 citations

Proceedings Article•DOI•

Learning hierarchical invariant spatio-temporal features for action recognition with independent subspace analysis

[...]

Quoc V. Le¹, Will Y. Zou¹, Serena Yeung¹, Andrew Y. Ng¹•Institutions (1)

Stanford University¹

20 Jun 2011

TL;DR: This paper presents an extension of the Independent Subspace Analysis algorithm to learn invariant spatio-temporal features from unlabeled video data and discovered that this method performs surprisingly well when combined with deep learning techniques such as stacking and convolution to learn hierarchical representations.

...read moreread less

Abstract: Previous work on action recognition has focused on adapting hand-designed local features, such as SIFT or HOG, from static images to the video domain. In this paper, we propose using unsupervised feature learning as a way to learn features directly from video data. More specifically, we present an extension of the Independent Subspace Analysis algorithm to learn invariant spatio-temporal features from unlabeled video data. We discovered that, despite its simplicity, this method performs surprisingly well when combined with deep learning techniques such as stacking and convolution to learn hierarchical representations. By replacing hand-designed features with our learned features, we achieve classification results superior to all previous published results on the Hollywood2, UCF, KTH and YouTube action recognition datasets. On the challenging Hollywood2 and YouTube action datasets we obtain 53.3% and 75.8% respectively, which are approximately 5% better than the current best published results. Further benefits of this method, such as the ease of training and the efficiency of training and prediction, will also be discussed. You can download our code and learned spatio-temporal features here: http://ai.stanford.edu/∼wzou/

...read moreread less

1,116 citations

Proceedings Article•

On optimization methods for deep learning

[...]

Jiquan Ngiam¹, Adam Coates¹, Ahbik Lahiri¹, Bobby Prochnow¹, Quoc V. Le¹, Andrew Y. Ng¹ - Show less +2 more•Institutions (1)

Stanford University¹

28 Jun 2011

TL;DR: It is shown that more sophisticated off-the-shelf optimization methods such as Limited memory BFGS (L-BFGS) and Conjugate gradient (CG) with line search can significantly simplify and speed up the process of pretraining deep algorithms.

...read moreread less

Abstract: The predominant methodology in training deep learning advocates the use of stochastic gradient descent methods (SGDs). Despite its ease of implementation, SGDs are difficult to tune and parallelize. These problems make it challenging to develop, debug and scale up deep learning algorithms with SGDs. In this paper, we show that more sophisticated off-the-shelf optimization methods such as Limited memory BFGS (L-BFGS) and Conjugate gradient (CG) with line search can significantly simplify and speed up the process of pretraining deep algorithms. In our experiments, the difference between L-BFGS/CG and SGDs are more pronounced if we consider algorithmic extensions (e.g., sparsity regularization) and hardware extensions (e.g., GPUs or computer clusters). Our experiments with distributed optimization support the use of L-BFGS with locally connected networks and convolutional neural networks. Using L-BFGS, our convolutional network model achieves 0.69% on the standard MNIST dataset. This is a state-of-the-art result on MNIST among algorithms that do not use distortions or pretraining.

...read moreread less

908 citations

Improving the speed of neural networks on CPUs

[...]

Vincent Vanhoucke, Andrew W. Senior, Mark Z. Mao

01 Jan 2011

TL;DR: This paper uses speech recognition as an example task, and shows that a real-time hybrid hidden Markov model / neural network (HMM/NN) large vocabulary system can be built with a 10× speedup over an unoptimized baseline and a 4× speed up over an aggressively optimized floating-point baseline at no cost in accuracy.

...read moreread less

Abstract: Recent advances in deep learning have made the use of large, deep neural networks with tens of millions of parameters suitable for a number of applications that require real-time processing. The sheer size of these networks can represent a challenging computational burden, even for modern CPUs. For this reason, GPUs are routinely used instead to train and run such networks. This paper is a tutorial for students and researchers on some of the techniques that can be used to reduce this computational cost considerably on modern x86 CPUs. We emphasize data layout, batching of the computation, the use of SSE2 instructions, and particularly leverage SSSE3 and SSE4 fixed-point instructions which provide a 3× improvement over an optimized floating-point baseline. We use speech recognition as an example task, and show that a real-time hybrid hidden Markov model / neural network (HMM/NN) large vocabulary system can be built with a 10× speedup over an unoptimized baseline and a 4× speedup over an aggressively optimized floating-point baseline at no cost in accuracy. The techniques described extend readily to neural network training and provide an effective alternative to the use of specialized hardware.

...read moreread less

883 citations

Book Chapter•DOI•

Sequential deep learning for human action recognition

[...]

Moez Baccouche, Franck Mamalet, Christian Wolf¹, Christophe Garcia¹, Atilla Baskurt¹ - Show less +1 more•Institutions (1)

Institut national des sciences Appliquées de Lyon¹

16 Nov 2011

TL;DR: A fully automated deep model, which learns to classify human actions without using any prior knowledge is proposed, which outperforms existing deep models, and gives comparable results with the best related works.

...read moreread less

Abstract: We propose in this paper a fully automated deep model, which learns to classify human actions without using any prior knowledge. The first step of our scheme, based on the extension of Convolutional Neural Networks to 3D, automatically learns spatio-temporal features. A Recurrent Neural Network is then trained to classify each sequence considering the temporal evolution of the learned features for each timestep. Experimental results on the KTH dataset show that the proposed approach outperforms existing deep models, and gives comparable results with the best related works.

...read moreread less

788 citations

Journal Article•DOI•

Unsupervised learning of hierarchical representations with convolutional deep belief networks

[...]

Honglak Lee¹, Roger Grosse², Rajesh Ranganath³, Andrew Y. Ng³•Institutions (3)

University of Michigan¹, Massachusetts Institute of Technology², Stanford University³

01 Oct 2011-Communications of The ACM

TL;DR: The convolutional deep belief network is presented, a hierarchical generative model that scales to realistic image sizes and is translation-invariant and supports efficient bottom-up and top-down probabilistic inference.

...read moreread less

Abstract: There has been much interest in unsupervised learning of hierarchical generative models such as deep belief networks (DBNs); however, scaling such models to full-sized, high-dimensional images remains a difficult problem. To address this problem, we present the convolutional deep belief network, a hierarchical generative model that scales to realistic image sizes. This model is translation-invariant and supports efficient bottom-up and top-down probabilistic inference. Key to our approach is probabilistic max-pooling, a novel technique that shrinks the representations of higher layers in a probabilistically sound way. Our experiments show that the algorithm learns useful high-level visual features, such as object parts, from unlabeled images of objects and natural scenes. We demonstrate excellent performance on several visual recognition tasks and show that our model can perform hierarchical (bottom-up and top-down) inference over full-sized images.

...read moreread less

388 citations

Proceedings Article•DOI•

A committee of neural networks for traffic sign classification

[...]

Dan Ciresan¹, Ueli Meier¹, Jonathan Masci¹, Jürgen Schmidhuber¹•Institutions (1)

University of Lugano¹

03 Oct 2011

TL;DR: This work describes the approach that won the preliminary phase of the German traffic sign recognition benchmark with a better-than-human recognition rate, and obtains an even better recognition rate by further training the nets.

...read moreread less

Abstract: We describe the approach that won the preliminary phase of the German traffic sign recognition benchmark with a better-than-human recognition rate of 98.98%.We obtain an even better recognition rate of 99.15% by further training the nets. Our fast, fully parameterizable GPU implementation of a Convolutional Neural Network does not require careful design of pre-wired feature extractors, which are rather learned in a supervised way. A CNN/MLP committee further boosts recognition performance.

...read moreread less

380 citations

Proceedings Article•

Shallow vs. Deep Sum-Product Networks

[...]

Olivier Delalleau¹, Yoshua Bengio¹•Institutions (1)

Université de Montréal¹

12 Dec 2011

TL;DR: It is proved there exist families of functions that can be represented much more efficiently with a deep network than with a shallow one, i.e. with substantially fewer hidden units.

...read moreread less

Abstract: We investigate the representational power of sum-product networks (computation networks analogous to neural networks, but whose individual units compute either products or weighted sums), through a theoretical analysis that compares deep (multiple hidden layers) vs. shallow (one hidden layer) architectures. We prove there exist families of functions that can be represented much more efficiently with a deep network than with a shallow one, i.e. with substantially fewer hidden units. Such results were not available until now, and contribute to motivate recent research involving learning of deep sum-product networks, and more generally motivate research in Deep Learning.

...read moreread less

312 citations

Journal Article•

Deep Learning and Its Applications to Signal and Information Processing

[...]

Dong Yu, Li Deng

01 Jan 2011-IEEE Signal Processing Magazine

TL;DR: A property common to these shallow learning models is the simple architecture that consists of only one layer responsible for transforming the raw input signals or features into a problem-specific feature space, which may be unobservable.

...read moreread less

Abstract: Today, signal processing research has a significantly widened its scope compared with just a few years ago [4], and machine learning has been an important technical area of the signal processing society. Since 2006, deep learning—a new area of machine learning research—has emerged [7], impacting a wide range of signal and information processing work within the traditional and the new, widened scopes. Various workshops, such as the 2009 ICML Workshop on Learning Feature Hierarchies; the 2008 NIPS Deep Learning Workshop: Foundations and Future Directions; and the 2009 NIPS Workshop on Deep Learning for Speech Recognition and Related Applications as well as an upcoming special issue on deep learning for speech and language processing in IEEE Transactions on Audio, Speech, and Language Processing (2010) have been devoted exclusively to deep learning and its applications to classical signal processing areas. We have also seen the government sponsor research on deep learning.

...read moreread less

Book Chapter•DOI•

On the expressive power of deep architectures

[...]

Yoshua Bengio¹, Olivier Delalleau¹•Institutions (1)

Université de Montréal¹

05 Oct 2011

TL;DR: Some of the theoretical motivations for deep architectures, as well as some of their practical successes, are reviewed, and directions of investigations to address some of the remaining challenges are proposed.

...read moreread less

Abstract: Deep architectures are families of functions corresponding to deep circuits. Deep Learning algorithms are based on parametrizing such circuits and tuning their parameters so as to approximately optimize some training objective. Whereas it was thought too difficult to train deep architectures, several successful algorithms have been proposed in recent years. We review some of the theoretical motivations for deep architectures, as well as some of their practical successes, and propose directions of investigations to address some of the remaining challenges.

...read moreread less

Proceedings Article•

Selecting Receptive Fields in Deep Networks

[...]

Adam Coates¹, Andrew Y. Ng¹•Institutions (1)

Stanford University¹

12 Dec 2011

TL;DR: This paper proposes a fast method to choose local receptive fields that group together those low-level features that are most similar to each other according to a pairwise similarity metric, and produces results showing how this method allows even simple unsupervised training algorithms to train successful multi-layered networks that achieve state-of-the-art results on CIFAR and STL datasets.

...read moreread less

Abstract: Recent deep learning and unsupervised feature learning systems that learn from unlabeled data have achieved high performance in benchmarks by using extremely large architectures with many features (hidden units) at each layer. Unfortunately, for such large architectures the number of parameters can grow quadratically in the width of the network, thus necessitating hand-coded "local receptive fields" that limit the number of connections from lower level features to higher ones (e.g., based on spatial locality). In this paper we propose a fast method to choose these connections that may be incorporated into a wide variety of unsupervised training methods. Specifically, we choose local receptive fields that group together those low-level features that are most similar to each other according to a pairwise similarity metric. This approach allows us to harness the advantages of local receptive fields (such as improved scalability, and reduced data requirements) when we do not know how to specify such receptive fields by hand or where our unsupervised training algorithm has no obvious generalization to a topographic setting. We produce results showing how this method allows us to use even simple unsupervised training algorithms to train successful multi-layered networks that achieve state-of-the-art results on CIFAR and STL datasets: 82.0% and 60.1% accuracy, respectively.

...read moreread less

Proceedings Article•DOI•

Sentiment classification based on supervised latent n-gram analysis

[...]

Dmitriy Bespalov¹, Bing Bai², Yanjun Qi², Ali Shokoufandeh¹•Institutions (2)

Drexel University¹, Princeton University²

24 Oct 2011

TL;DR: A deep neural network is utilized to build a unified discriminative framework that allows for estimating the parameters of the latent space as well as the classification function with a bias for the target classification task at hand.

...read moreread less

Abstract: In this paper, we propose an efficient embedding for modeling higher-order (n-gram) phrases that projects the n-grams to low-dimensional latent semantic space, where a classification function can be defined We utilize a deep neural network to build a unified discriminative framework that allows for estimating the parameters of the latent space as well as the classification function with a bias for the target classification task at hand We apply the framework to large-scale sentimental classification task We present comparative evaluation of the proposed method on two (large) benchmark data sets for online product reviews The proposed method achieves superior performance in comparison to the state of the art

...read moreread less

Proceedings Article•

Deep Learning for Efficient Discriminative Parsing

[...]

Ronan Collobert¹•Institutions (1)

Princeton University¹

14 Jun 2011

TL;DR: A new fast purely discriminative algorithm for natural language parsing, based on a “deep” recurrent convolutional graph transformer network (GTN) and assuming a decomposition of a parse tree into a stack of "levels”, the network predicts a level of the tree taking into account predictions of previous levels.

...read moreread less

Abstract: We propose a new fast purely discriminative algorithm for natural language parsing, based on a “deep” recurrent convolutional graph transformer network (GTN). Assuming a decomposition of a parse tree into a stack of “levels”, the network predicts a level of the tree taking into account predictions of previous levels. Using only few basic text features which leverage word representations from Collobert and Weston (2008), we show similar performance (in F1 score) to existing pure discriminative parsers and existing “benchmark” parsers (like Collins parser, probabilistic context-free grammars based), with a huge speed advantage.

...read moreread less

Proceedings Article•DOI•

Deep belief nets for natural language call-routing

[...]

Ruhi Sarikaya¹, Geoffrey E. Hinton², Bhuvana Ramabhadran¹•Institutions (2)

IBM¹, University of Toronto²

22 May 2011

TL;DR: A DBN-based model gives a call-routing classification accuracy that is equal to the best of the other models even though it currently uses an impoverished representation of the input.

...read moreread less

Abstract: This paper considers application of Deep Belief Nets (DBNs) to natural language call routing. DBNs have been successfully applied to a number of tasks, including image, audio and speech classification, thanks to the recent discovery of an efficient learning technique. DBNs learn a multi-layer generative model from unlabeled data and the features discovered by this model are then used to initialize a feed-forward neural network which is fine-tuned with backpropagation. We compare a DBN-initialized neural network to three widely used text classification algorithms; Support Vector machines (SVM), Boosting and Maximum Entropy (MaxEnt). The DBN-based model gives a call-routing classification accuracy that is equal to the best of the other models even though it currently uses an impoverished representation of the input.

...read moreread less

Journal Article•DOI•

Convergence analysis of online gradient method for BP neural networks

[...]

Wei Wu¹, Jian Wang¹, Mingsong Cheng¹, Zhengxue Li¹•Institutions (1)

Dalian University of Technology¹

01 Jan 2011-Neural Networks

TL;DR: Some weak and strong convergence results for the learning methods are presented, indicating that the gradient of the error function goes to zero and the weight sequence goes to a fixed point, respectively.

...read moreread less

Journal Article•DOI•

A parallel neural network approach to prediction of Parkinson's Disease

[...]

Freddie ström¹, Raşit Köker²•Institutions (2)

Linköping University¹, Sakarya University²

01 Sep 2011-Expert Systems With Applications

TL;DR: This paper uses more than a unique neural network to reduce the possibility of decision with error in the prediction of Parkinson's Disease and demonstrates that the designed system, to some extent, deals with the problems of imbalanced data sets.

...read moreread less

Abstract: Recently the neural network based diagnosis of medical diseases has taken a great deal of attention. In this paper a parallel feed-forward neural network structure is used in the prediction of Parkinson's Disease. The main idea of this paper is using more than a unique neural network to reduce the possibility of decision with error. The output of each neural network is evaluated by using a rule-based system for the final decision. Another important point in this paper is that during the training process, unlearned data of each neural network is collected and used in the training set of the next neural network. The designed parallel network system significantly increased the robustness of the prediction. A set of nine parallel neural networks yielded an improvement of 8.4% on the prediction of Parkinson's Disease compared to a single unique network. Furthermore, it is demonstrated that the designed system, to some extent, deals with the problems of imbalanced data sets.

...read moreread less

Journal Article•DOI•

Discovering binary codes for documents by learning deep generative models.

[...]

Geoffrey E. Hinton¹, Ruslan Salakhutdinov²•Institutions (2)

University of Toronto¹, Massachusetts Institute of Technology²

01 Jan 2011-Topics in Cognitive Science

TL;DR: A deep generative model in which the lowest layer represents the word-count vector of a document and the top layer represents a learned binary code for that document is described, which allows more accurate and much faster retrieval than latent semantic analysis.

...read moreread less

Abstract: We describe a deep generative model in which the lowest layer represents the word-count vector of a document and the top layer represents a learned binary code for that document. The top two layers of the generative model form an undirected associative memory and the remaining layers form a belief net with directed, top-down connections. We present efficient learning and inference procedures for this type of generative model and show that it allows more accurate and much faster retrieval than latent semantic analysis. By using our method as a filter for a much slower method called TF-IDF we achieve higher accuracy than TF-IDF alone and save several orders of magnitude in retrieval time. By using short binary codes as addresses, we can perform retrieval on very large document sets in a time that is independent of the size of the document set using only one word of memory to describe each document.

...read moreread less

Journal Article•DOI•

Comparing performances of backpropagation and genetic algorithms in the data classification

[...]

H. Hasan Örkcü¹, Hasan Bal¹•Institutions (1)

Gazi University¹

01 Apr 2011-Expert Systems With Applications

TL;DR: A large-scale comparison of performances of the neural network training methods is examined on the data classification datasets and shows that the real-coded genetic algorithm may offer efficient alternative to traditional training methods for the classification problem.

...read moreread less

Abstract: Artificial neural networks (ANN) have a wide ranging usage area in the data classification problems. Backpropagation algorithm is classical technique used in the training of the artificial neural networks. Since this algorithm has many disadvantages, the training of the neural networks has been implemented with the binary and real-coded genetic algorithms. These algorithms can be used for the solutions of the classification problems. The real-coded genetic algorithm has been compared with other training methods in the few works. It is known that the comparison of the approaches is as important as proposing a new classification approach. For this reason, in this study, a large-scale comparison of performances of the neural network training methods is examined on the data classification datasets. The experimental comparison contains different real classification data taken from the literature and a simulation study. A comparative analysis on the real data sets and simulation data shows that the real-coded genetic algorithm may offer efficient alternative to traditional training methods for the classification problem.

...read moreread less

Journal Article•DOI•

A New Formulation for Feedforward Neural Networks

[...]

Saman Razavi¹, Bryan A. Tolson¹•Institutions (1)

University of Waterloo¹

01 Oct 2011-IEEE Transactions on Neural Networks

TL;DR: Results show that ReNN can be trained more effectively and efficiently compared to the common neural networks and the proposed regularization measure is an effective indicator of how a network would perform in terms of generalization.

...read moreread less

Abstract: Feedforward neural network is one of the most commonly used function approximation techniques and has been applied to a wide variety of problems arising from various disciplines. However, neural networks are black-box models having multiple challenges/difficulties associated with training and generalization. This paper initially looks into the internal behavior of neural networks and develops a detailed interpretation of the neural network functional geometry. Based on this geometrical interpretation, a new set of variables describing neural networks is proposed as a more effective and geometrically interpretable alternative to the traditional set of network weights and biases. Then, this paper develops a new formulation for neural networks with respect to the newly defined variables; this reformulated neural network (ReNN) is equivalent to the common feedforward neural network but has a less complex error response surface. To demonstrate the learning ability of ReNN, in this paper, two training methods involving a derivative-based (a variation of backpropagation) and a derivative-free optimization algorithms are employed. Moreover, a new measure of regularization on the basis of the developed geometrical interpretation is proposed to evaluate and improve the generalization ability of neural networks. The value of the proposed geometrical interpretation, the ReNN approach, and the new regularization measure are demonstrated across multiple test problems. Results show that ReNN can be trained more effectively and efficiently compared to the common neural networks and the proposed regularization measure is an effective indicator of how a network would perform in terms of generalization.

...read moreread less

Journal Article•DOI•

Discriminative deep belief networks for visual data classification

[...]

Yan Liu¹, Shusen Zhou¹, Qingcai Chen²•Institutions (2)

Hong Kong Polytechnic University¹, Harbin Institute of Technology²

01 Oct 2011-Pattern Recognition

TL;DR: This paper proposes a novel semi-supervised classifier called discriminative deep belief networks (DDBN), which utilizes a new deep architecture to integrate the abstraction ability of deep belief nets (DBN) and discrim inative ability of backpropagation strategy.

...read moreread less

Journal Article•DOI•

Learning Speaker-Specific Characteristics With a Deep Neural Architecture

[...]

Ke Chen¹, Ahmad Salman¹•Institutions (1)

University of Manchester¹

01 Nov 2011-IEEE Transactions on Neural Networks

TL;DR: In this article, a deep neural architecture (DNA) was proposed for learning speaker-specific characteristics from mel-frequency cepstral coefficients, an acoustic representation commonly used in both speech recognition and speaker recognition, which results in a speakerspecific overcomplete representation.

...read moreread less

Abstract: Speech signals convey various yet mixed information ranging from linguistic to speaker-specific information. However, most of acoustic representations characterize all different kinds of information as whole, which could hinder either a speech or a speaker recognition (SR) system from producing a better performance. In this paper, we propose a novel deep neural architecture (DNA) especially for learning speaker-specific characteristics from mel-frequency cepstral coefficients, an acoustic representation commonly used in both speech recognition and SR, which results in a speaker-specific overcomplete representation. In order to learn intrinsic speaker-specific characteristics, we come up with an objective function consisting of contrastive losses in terms of speaker similarity/dissimilarity and data reconstruction losses used as regularization to normalize the interference of non-speaker-related information. Moreover, we employ a hybrid learning strategy for learning parameters of the deep neural networks: i.e., local yet greedy layerwise unsupervised pretraining for initialization and global supervised learning for the ultimate discriminative goal. With four Linguistic Data Consortium (LDC) benchmarks and two non-English corpora, we demonstrate that our overcomplete representation is robust in characterizing various speakers, no matter whether their utterances have been used in training our DNA, and highly insensitive to text and languages spoken. Extensive comparative studies suggest that our approach yields favorite results in speaker verification and segmentation. Finally, we discuss several issues concerning our proposed approach.

...read moreread less

Proceedings Article•DOI•

Bilinear deep learning for image classification

[...]

Sheng-hua Zhong¹, Yan Liu¹, Yang Liu¹•Institutions (1)

Hong Kong Polytechnic University¹

28 Nov 2011

TL;DR: This paper proposes a novel deep learning model called bilinear deep belief network (BDBN), which aims to provide human-like judgment by referencing the architecture of the human visual system and the procedure of intelligent perception, and develops BDBN under a semi-supervised learning framework.

...read moreread less

Abstract: Image classification is a well-known classical problem in multimedia content analysis. This paper proposes a novel deep learning model called bilinear deep belief network (BDBN) for image classification. Unlike previous image classification models, BDBN aims to provide human-like judgment by referencing the architecture of the human visual system and the procedure of intelligent perception. Therefore, the multi-layer structure of the cortex and the propagation of information in the visual areas of the brain are realized faithfully. Unlike most existing deep models, BDBN utilizes a bilinear discriminant strategy to simulate the "initial guess" in human object recognition, and at the same time to avoid falling into a bad local optimum. To preserve the natural tensor structure of the image data, a novel deep architecture with greedy layer-wise reconstruction and global fine-tuning is proposed. To adapt real-world image classification tasks, we develop BDBN under a semi-supervised learning framework, which makes the deep model work well when labeled images are insufficient. Comparative experiments on three standard datasets show that the proposed algorithm outperforms both representative classification models and existing deep learning techniques. More interestingly, our demonstrations show that the proposed BDBN works consistently with the visual perception of humans.

...read moreread less

Book Chapter•DOI•

On the expressive power of deep architectures

[...]

Yoshua Bengio¹, Olivier Delalleau¹•Institutions (1)

Université de Montréal¹

05 Oct 2011

...read moreread less

Proceedings Article•

An Introduction to Deep Learning

[...]

Ludovic Arnold, Sébastien Rebecchi, Sylvain Chevallier, Hélène Paugam-Moisy

27 Apr 2011

TL;DR: The present tutorial introducing the ESANN deep learning special session details the state-of-the-art models and summarizes the current understanding of this learning approach which is a reference for many difficult classification tasks.

...read moreread less

Abstract: The deep learning paradigm tackles problems on which shallow architectures (e.g. SVM) are affected by the curse of dimensionality. As part of a two-stage learning scheme involving multiple layers of non-linear processing a set of statistically robust features is automatically extracted from the data. The present tutorial introducing the ESANN deep learning special session details the state-of-the-art models and summarizes the current understanding of this learning approach which is a reference for many difficult classification tasks.

...read moreread less

Proceedings Article•

The Hierarchical Beta Process for Convolutional Factor Analysis and Deep Learning

[...]

Bo Chen¹, Gungor Polatkan², Guillermo Sapiro³, Lawrence Carin¹, David B. Dunson¹ - Show less +1 more•Institutions (3)

Duke University¹, Princeton University², University of Minnesota³

28 Jun 2011

TL;DR: A convolutional factor-analysis model is developed, with the number of filters (factors) inferred via the beta process (BP) and hierarchical BP, for single-task and multi-task learning, respectively.

...read moreread less

Abstract: A convolutional factor-analysis model is developed, with the number of filters (factors) inferred via the beta process (BP) and hierarchical BP, for single-task and multi-task learning, respectively. The computation of the model parameters is implemented within a Bayesian setting, employing Gibbs sampling; we explicitly exploit the convolutional nature of the expansion to accelerate computations. The model is used in a multi-level ("deep") analysis of general data, with specific results presented for image-processing data sets, e.g., classification.

...read moreread less

Journal Article•DOI•

Rules extraction from constructively trained neural networks based on genetic algorithms

[...]

Marghny H. Mohamed¹•Institutions (1)

Assiut University¹

01 Oct 2011-Neurocomputing

TL;DR: This paper trains the neural networks by constructive learning and presents the analysis of the convergence rate of the error in a neural network with and without threshold which have been learnt by a constructive method to obtain the simple structure of the network.

...read moreread less

Book Chapter•DOI•

Deep learning networks for off-line handwritten signature recognition

[...]

Bernardete Ribeiro¹, Ivo Gonçalves¹, Sergio Santos¹, Alexander Kovačec¹•Institutions (1)

University of Coimbra¹

15 Nov 2011

TL;DR: A deep learning model for off-line handwritten signature recognition which is able to extract high-level representations is presented and a two-step hybrid model for signature identification and verification is proposed improving the misclassification rate in the well-known GPDS database.

...read moreread less

Abstract: Reliable identification and verification of off-line handwritten signatures from images is a difficult problem with many practical applications. This task is a difficult vision problem within the field of biometrics because a signature may change depending on psychological factors of the individual. Motivated by advances in brain science which describe how objects are represented in the visual cortex, advanced research on deep neural networks has been shown to work reliably on large image data sets. In this paper, we present a deep learning model for off-line handwritten signature recognition which is able to extract high-level representations. We also propose a two-step hybrid model for signature identification and verification improving the misclassification rate in the well-known GPDS database.

...read moreread less

Journal Article•DOI•

Deep Learning Regularized Fisher Mappings

[...]

Wai Keung Wong, Mingming Sun¹•Institutions (1)

Nanjing University of Science and Technology¹

01 Oct 2011-IEEE Transactions on Neural Networks

TL;DR: A new feature extraction method called regularized deep Fisher mapping (RDFM), which learns an explicit mapping from the sample space to the feature space using a deep neural network to enhance the separability of features according to the Fisher criterion is proposed.

...read moreread less

Abstract: For classification tasks, it is always desirable to extract features that are most effective for preserving class separability. In this brief, we propose a new feature extraction method called regularized deep Fisher mapping (RDFM), which learns an explicit mapping from the sample space to the feature space using a deep neural network to enhance the separability of features according to the Fisher criterion. Compared to kernel methods, the deep neural network is a deep and nonlocal learning architecture, and therefore exhibits more powerful ability to learn the nature of highly variable datasets from fewer samples. To eliminate the side effects of overfitting brought about by the large capacity of powerful learners, regularizers are applied in the learning procedure of RDFM. RDFM is evaluated in various types of datasets, and the results reveal that it is necessary to apply unsupervised regularization in the fine-tuning phase of deep learning. Thus, for very flexible models, the optimal Fisher feature extractor may be a balance between discriminative ability and descriptive ability.

...read moreread less

Collapse