scispace - formally typeset
Search or ask a question

Showing papers on "Deep learning published in 2010"


Proceedings Article
31 Mar 2010
TL;DR: The objective here is to understand better why standard gradient descent from random initialization is doing so poorly with deep neural networks, to better understand these recent relative successes and help design better algorithms in the future.
Abstract: Whereas before 2006 it appears that deep multilayer neural networks were not successfully trained, since then several algorithms have been shown to successfully train them, with experimental results showing the superiority of deeper vs less deep architectures. All these experimental results were obtained with new initialization or training mechanisms. Our objective here is to understand better why standard gradient descent from random initialization is doing so poorly with deep neural networks, to better understand these recent relative successes and help design better algorithms in the future. We first observe the influence of the non-linear activations functions. We find that the logistic sigmoid activation is unsuited for deep networks with random initialization because of its mean value, which can drive especially the top hidden layer into saturation. Surprisingly, we find that saturated units can move out of saturation by themselves, albeit slowly, and explaining the plateaus sometimes seen when training neural networks. We find that a new non-linearity that saturates less can often be beneficial. Finally, we study how activations and gradients vary across layers and during training, with the idea that training may be more difficult when the singular values of the Jacobian associated with each layer are far from 1. Based on these considerations, we propose a new initialization scheme that brings substantially faster convergence. 1 Deep Neural Networks Deep learning methods aim at learning feature hierarchies with features from higher levels of the hierarchy formed by the composition of lower level features. They include Appearing in Proceedings of the 13 International Conference on Artificial Intelligence and Statistics (AISTATS) 2010, Chia Laguna Resort, Sardinia, Italy. Volume 9 of JMLR: WC Weston et al., 2008). Much attention has recently been devoted to them (see (Bengio, 2009) for a review), because of their theoretical appeal, inspiration from biology and human cognition, and because of empirical success in vision (Ranzato et al., 2007; Larochelle et al., 2007; Vincent et al., 2008) and natural language processing (NLP) (Collobert & Weston, 2008; Mnih & Hinton, 2009). Theoretical results reviewed and discussed by Bengio (2009), suggest that in order to learn the kind of complicated functions that can represent high-level abstractions (e.g. in vision, language, and other AI-level tasks), one may need deep architectures. Most of the recent experimental results with deep architecture are obtained with models that can be turned into deep supervised neural networks, but with initialization or training schemes different from the classical feedforward neural networks (Rumelhart et al., 1986). Why are these new algorithms working so much better than the standard random initialization and gradient-based optimization of a supervised training criterion? Part of the answer may be found in recent analyses of the effect of unsupervised pretraining (Erhan et al., 2009), showing that it acts as a regularizer that initializes the parameters in a “better” basin of attraction of the optimization procedure, corresponding to an apparent local minimum associated with better generalization. But earlier work (Bengio et al., 2007) had shown that even a purely supervised but greedy layer-wise procedure would give better results. So here instead of focusing on what unsupervised pre-training or semi-supervised criteria bring to deep architectures, we focus on analyzing what may be going wrong with good old (but deep) multilayer neural networks. Our analysis is driven by investigative experiments to monitor activations (watching for saturation of hidden units) and gradients, across layers and across training iterations. We also evaluate the effects on these of choices of activation function (with the idea that it might affect saturation) and initialization procedure (since unsupervised pretraining is a particular form of initialization and it has a drastic impact).

9,500 citations


Journal ArticleDOI
TL;DR: An overview of the mainstream deep learning approaches and research directions proposed over the past decade is provided and some perspective into how it may evolve is presented.
Abstract: This article provides an overview of the mainstream deep learning approaches and research directions proposed over the past decade. It is important to emphasize that each approach has strengths and "weaknesses, depending on the application and context in "which it is being used. Thus, this article presents a summary on the current state of the deep machine learning field and some perspective into how it may evolve. Convolutional Neural Networks (CNNs) and Deep Belief Networks (DBNs) (and their respective variations) are focused on primarily because they are well established in the deep learning field and show great promise for future work.

1,103 citations



Book
14 Oct 2010
TL;DR: This paper presents a meta-modelling framework for modeling Neural Networks without Training for Optimization that combines self-Organizing Maps and Unsupervised Classification.
Abstract: Neural Networks: An Overview.- Modeling with Neural Networks: Principles and Model Design Methodology.- Modeling Metholodgy: Dimension Reduction and Resampling Methods.- Neural Identification of Controlled Dynamical Systems and Recurrent Networks.- Closed-Loop Control Learning.- Discrimination.- Self-Organizing Maps and Unsupervised Classification.- Neural Networks without Training for Optimization.

519 citations


Proceedings Article
01 Sep 2010
TL;DR: This paper reports the recent exploration of the layer-by-layer learning strategy for training a multi-layer generative model of patches of speech spectrograms and shows that the binary codes learned produce a logspectral distortion that is approximately 2 dB lower than a subband vector quantization technique over the entire frequency range of wide-band speech.
Abstract: This paper reports our recent exploration of the layer-by-layer learning strategy for training a multi-layer generative model of patches of speech spectrograms. The top layer of the generative model learns binary codes that can be used for efficient compression of speech and could also be used for scalable speech recognition or rapid speech content retrieval. Each layer of the generative model is fully connected to the layer below and the weights on these connections are pretrained efficiently by using the contrastive divergence approximation to the log likelihood gradient. After layer-bylayer pre-training we “unroll” the generative model to form a deep auto-encoder, whose parameters are then fine-tuned using back-propagation. To reconstruct the full-length speech spectrogram, individual spectrogram segments predicted by their respective binary codes are combined using an overlapand-add method. Experimental results on speech spectrogram coding demonstrate that the binary codes produce a logspectral distortion that is approximately 2 dB lower than a subband vector quantization technique over the entire frequency range of wide-band speech. Index Terms: deep learning, speech feature extraction, neural networks, auto-encoder, binary codes, Boltzmann machine

372 citations


Proceedings ArticleDOI
18 Jul 2010
TL;DR: A framework for combining the training of deep auto-encoders (for learning compact feature spaces) with recently-proposed batch-mode RL algorithms ( for learning policies) is proposed and an emphasis is put on the data-efficiency and on studying the properties of the feature spaces automatically constructed by the deep Auto-encoder neural networks.
Abstract: This paper discusses the effectiveness of deep auto-encoder neural networks in visual reinforcement learning (RL) tasks. We propose a framework for combining the training of deep auto-encoders (for learning compact feature spaces) with recently-proposed batch-mode RL algorithms (for learning policies). An emphasis is put on the data-efficiency of this combination and on studying the properties of the feature spaces automatically constructed by the deep auto-encoders. These feature spaces are empirically shown to adequately resemble existing similarities and spatial relations between observations and allow to learn useful policies. We propose several methods for improving the topology of the feature spaces making use of task-dependent information. Finally, we present first results on successfully learning good control policies directly on synthesized and real images.

353 citations


Journal ArticleDOI
TL;DR: This paper presents a meta-modelling framework that automates the very labor-intensive and therefore time-heavy and therefore expensive process of designing and implementing deep neural networks for reinforcement learning.
Abstract: Much recent research has been devoted to learning algorithms for deep architectures such as Deep Belief Networks and stacks of auto-encoder variants, with impressive results obtained in several are...

259 citations


Proceedings Article
01 Dec 2010
TL;DR: It is shown that pre-training can initialize weights to a point in the space where fine-tuning can be effective and thus is crucial in training deep structured models and in the recognition performance of a CD-DBN-HMM based large-vocabulary speech recognizer.
Abstract: Recently, deep learning techniques have been successfully applied to automatic speech recognition tasks -first to phonetic recognition with context-independent deep belief network (DBN) hidden Markov models (HMMs) and later to large vocabulary continuous speech recognition using context-dependent (CD) DBN-HMMs. In this paper, we report our most recent experiments designed to understand the roles of the two main phases of the DBN learning -pre-training and fine tuning -in the recognition performance of a CD-DBN-HMM based large-vocabulary speech recognizer. As expected, we show that pre-training can initialize weights to a point in the space where fine-tuning can be effective and thus is crucial in training deep structured models. However, a moderate increase of the amount of unlabeled pre-training data has an insignificant effect on the final recognition results as long as the original training size is sufficiently large to initialize the DBN weights. On the other hand, with additional labeled training data, the fine-tuning phase of DBN training can significantly improve the recognition accuracy.

235 citations


Journal ArticleDOI
TL;DR: The method introduced in this paper allows for training arbitrarily connected neural networks, therefore, more powerful neural network architectures with connections across layers can be efficiently trained.
Abstract: The method introduced in this paper allows for training arbitrarily connected neural networks, therefore, more powerful neural network architectures with connections across layers can be efficiently trained. The proposed method also simplifies neural network training, by using the forward-only computation instead of the traditionally used forward and backward computation.

187 citations


Journal ArticleDOI
TL;DR: It is proved that deep but narrow feedforward neural networks with sigmoidal units can represent any Boolean expression.
Abstract: Deep belief networks (DBN) are generative models with many layers of hidden causal variables, recently introduced by Hinton, Osindero, and Teh (2006), along with a greedy layer-wise unsupervised learning algorithm. Building on Le Roux and Bengio (2008) and Sutskever and Hinton (2008), we show that deep but narrow generative networks do not require more parameters than shallow ones to achieve universal approximation. Exploiting the proof technique, we prove that deep but narrow feedforward neural networks with sigmoidal units can represent any Boolean expression.

170 citations


Journal Article
TL;DR: In this article, the authors show that the logistic sigmoid activation is unsuited for deep networks with random initialization because of its mean value, which can drive especially the top hidden layer into saturation.
Abstract: Whereas before 2006 it appears that deep multilayer neural networks were not successfully trained, since then several algorithms have been shown to successfully train them, with experimental results showing the superiority of deeper vs less deep architectures. All these experimental results were obtained with new initialization or training mechanisms. Our objective here is to understand better why standard gradient descent from random initialization is doing so poorly with deep neural networks, to better understand these recent relative successes and help design better algorithms in the future. We first observe the influence of the non-linear activations functions. We find that the logistic sigmoid activation is unsuited for deep networks with random initialization because of its mean value, which can drive especially the top hidden layer into saturation. Surprisingly, we find that saturated units can move out of saturation by themselves, albeit slowly, and explaining the plateaus sometimes seen when training neural networks. We find that a new non-linearity that saturates less can often be beneficial. Finally, we study how activations and gradients vary across layers and during training, with the idea that training may be more difficult when the singular values of the Jacobian associated with each layer are far from 1. Based on these considerations, we propose a new initialization scheme that brings substantially faster convergence. 1 Deep Neural Networks Deep learning methods aim at learning feature hierarchies with features from higher levels of the hierarchy formed by the composition of lower level features. They include Appearing in Proceedings of the 13 International Conference on Artificial Intelligence and Statistics (AISTATS) 2010, Chia Laguna Resort, Sardinia, Italy. Volume 9 of JMLR: WC Weston et al., 2008). Much attention has recently been devoted to them (see (Bengio, 2009) for a review), because of their theoretical appeal, inspiration from biology and human cognition, and because of empirical success in vision (Ranzato et al., 2007; Larochelle et al., 2007; Vincent et al., 2008) and natural language processing (NLP) (Collobert & Weston, 2008; Mnih & Hinton, 2009). Theoretical results reviewed and discussed by Bengio (2009), suggest that in order to learn the kind of complicated functions that can represent high-level abstractions (e.g. in vision, language, and other AI-level tasks), one may need deep architectures. Most of the recent experimental results with deep architecture are obtained with models that can be turned into deep supervised neural networks, but with initialization or training schemes different from the classical feedforward neural networks (Rumelhart et al., 1986). Why are these new algorithms working so much better than the standard random initialization and gradient-based optimization of a supervised training criterion? Part of the answer may be found in recent analyses of the effect of unsupervised pretraining (Erhan et al., 2009), showing that it acts as a regularizer that initializes the parameters in a “better” basin of attraction of the optimization procedure, corresponding to an apparent local minimum associated with better generalization. But earlier work (Bengio et al., 2007) had shown that even a purely supervised but greedy layer-wise procedure would give better results. So here instead of focusing on what unsupervised pre-training or semi-supervised criteria bring to deep architectures, we focus on analyzing what may be going wrong with good old (but deep) multilayer neural networks. Our analysis is driven by investigative experiments to monitor activations (watching for saturation of hidden units) and gradients, across layers and across training iterations. We also evaluate the effects on these of choices of activation function (with the idea that it might affect saturation) and initialization procedure (since unsupervised pretraining is a particular form of initialization and it has a drastic impact).

Proceedings Article
23 Aug 2010
TL;DR: Experiments show that ADN outperforms the semi-supervised learning algorithm and deep learning techniques applied for sentiment classification.
Abstract: This paper presents a novel semi-supervised learning algorithm called Active Deep Networks (ADN), to address the semi-supervised sentiment classification problem with active learning. First, we propose the semi-supervised learning method of ADN. ADN is constructed by Restricted Boltzmann Machines (RBM) with unsupervised learning using labeled data and abundant of unlabeled data. Then the constructed structure is fine-tuned by gradient-descent based supervised learning with an exponential loss function. Second, we apply active learning in the semi-supervised learning framework to identify reviews that should be labeled as training data. Then ADN architecture is trained by the selected labeled data and all unlabeled data. Experiments on five sentiment classification datasets show that ADN outperforms the semi-supervised learning algorithm and deep learning techniques applied for sentiment classification.

Journal ArticleDOI
TL;DR: A review of the theory, extension models, learning algorithms and applications of the RNN, which has been applied in a variety of areas including pattern recognition, classification, image processing, combinatorial optimization and communication systems.
Abstract: The random neural network (RNN) is a recurrent neural network model inspired by the spiking behaviour of biological neuronal networks. Contrary to most artificial neural network models, neurons in the RNN interact by probabilistically exchanging excitatory and inhibitory spiking signals. The model is described by analytical equations, has a low complexity supervised learning algorithm and is a universal approximator for bounded continuous functions. The RNN has been applied in a variety of areas including pattern recognition, classification, image processing, combinatorial optimization and communication systems. It has also inspired research activity in modelling interacting entities in various systems such as queueing and gene regulatory networks. This paper presents a review of the theory, extension models, learning algorithms and applications of the RNN.

Journal ArticleDOI
TL;DR: This investigation on the speech recognition classification performance is performed using two standard neural networks structures as the classifier using Feed-forward Neural Network with back propagation algorithm and a Radial Basis Functions Neural Networks.
Abstract: In this paper is presented an investigation of the speech recognition classification performance. This investigation on the speech recognition classification performance is performed using two standard neural networks structures as the classifier. The utilized standard neural network types include Feed-forward Neural Network (NN) with back propagation algorithm and a Radial Basis Functions Neural Networks.

Proceedings ArticleDOI
26 Sep 2010
TL;DR: A novel NNLM adaptation method using a cascaded network is proposed and consistent WER reductions were obtained on a state-of-the-art Arabic LVCSR task over conventional NNLMs.
Abstract: Neural network language models (NNLM) have become an increasingly popular choice for large vocabulary continuous speech recognition (LVCSR) tasks, due to their inherent generalisation and discriminative power. This paper present two techniques to improve performance of standard NNLMs. First, the form of NNLM is modelled by introduction an additional output layer node to model the probability mass of out-of-shortlist (OOS) words. An associated probability normalisation scheme is explicitly derived. Second, a novel NNLM adaptation method using a cascaded network is proposed. Consistent WER reductions were obtained on a state-of-the-art Arabic LVCSR task over conventional NNLMs. Further performance gains were also observed after NNLM adaptation.

Journal ArticleDOI
TL;DR: Performance comparisons with similar studies found in the related literature indicated that the proposed ANN structures yield satisfactory results.

Journal ArticleDOI
TL;DR: In the proposed wavelet neural networks, composite functions are applied at the hidden nodes and the learning is done using ELM, which can achieve better performances in most cases than some relevant neural networks and learn much faster than neural networks training with the traditional back-propagation (BP) algorithm.

Journal ArticleDOI
TL;DR: The experiment shows that SVM outperforms the BP neural network based on the criteria of Mean Absolute Error (MAE), and indicates that S VM provides a promising technique in time series forecasting techniques.
Abstract: Time series prediction is an important problem in many applications in natural science, engineering and economics. The objective of this study is to examine the flexibility of Support Vector Machine (SVM) in time series forecasting by comparing it with a multi-layer back-propagation (BP) neural network. Five well-known time series data sets are used in this study to demonstrate the effectiveness of the forecasting model. These data are utilized to forecast through an application aimed to handle real life time series. The grid search technique using 10-fold cross validation is used to determine the best value of SVM parameters in the forecasting process. The experiment shows that SVM outperforms the BP neural network based on the criteria of Mean Absolute Error (MAE). It also indicates that SVM provides a promising technique in time series forecasting techniques.

DOI
01 Jan 2010
TL;DR: The ST-DBN has superior performance on discriminative and generative tasks including action recognition and video denoising when compared to convolutional deep belief networks applied on a per-frame basis and has superior feature invariance properties compared to CDBNs.
Abstract: We present a novel hierarchical, distributed model for unsupervised learning of invariant spatio-temporal features from video. Our approach builds on previous deep learning methods and uses the convolutional Restricted Boltzmann machine (CRBM) as a basic processing unit. Our model, called the Space-Time Deep Belief Network (ST-DBN), alternates the aggregation of spatial and temporal information so that higher layers capture longer range statistical dependencies in both space and time. Our experiments show that the ST-DBN has superior performance on discriminative and generative tasks including action recognition and video denoising when compared to convolutional deep belief networks (CDBNs) applied on a per-frame basis. Simultaneously, the ST-DBN has superior feature invariance properties compared to CDBNs and can integrate information from both space and time to fill in missing data in video.

Proceedings Article
21 Jun 2010
TL;DR: This paper presents supervised embedding techniques that use a deep network to collapse classes, pre-trained using a stack of RBMs, and finetuned using approaches that try to collapsing classes.
Abstract: Deep learning has been successfully applied to perform non-linear embedding. In this paper, we present supervised embedding techniques that use a deep network to collapse classes. The network is pre-trained using a stack of RBMs, and finetuned using approaches that try to collapse classes. The finetuning is inspired by ideas from NCA, but it uses a Student t-distribution to model the similarities of data points belonging to the same class in the embedding. We investigate two types of objective functions: deep t-distributed MCML (dt-MCML) and deep t-distributed NCA (dt-NCA). Our experiments on two handwritten digit data sets reveal the strong performance of dt-MCML in supervised parametric data visualization, whereas dt-NCA outperforms alternative techniques when embeddings with more than two or three dimensions are constructed, e.g., to obtain good classification performances. Overall, our results demonstrate the advantage of using a deep architecture and a heavy-tailed t-distribution for measuring pairwise similarities in supervised embedding.


Proceedings Article
01 Jan 2010
TL;DR: A new parallel-training tool TNet was designed and optimized for multiprocessor computers and the training acceleration rates are reported on a phoneme-state classification task.
Abstract: The feed-forward multi-layer neural networks have significant importance in speech recognition. A new parallel-training tool TNet was designed and optimized for multiprocessor computers. The training acceleration rates are reported on a phoneme-state classification task.

Journal ArticleDOI
TL;DR: A sequential Bayesian learning (SBL) is proposed for modular neural networks aiming at efficiently aggregating the outputs of members of the ensemble to demonstrate that the proposed method can perform information aggregation efficiently in data modeling.
Abstract: Modular neural network is a popular neural network model which has many successful applications. In this paper, a sequential Bayesian learning (SBL) is proposed for modular neural networks aiming at efficiently aggregating the outputs of members of the ensemble. The experimental results on eight benchmark problems have demonstrated that the proposed method can perform information aggregation efficiently in data modeling.

Journal ArticleDOI
TL;DR: This work describes the backpropagation procedure, the leading case of gradient descent learning algorithms for the class of networks considered here, as well as an efficient heuristic modification and examines the applicability of these learning methods to the problem of predicting interregional telecommunication flows.
Abstract: Learning in neural networks has attracted considerable interest in recent years. Our focus is on learning in single hidden-layer feedforward networks which is posed as a search in the network parameter space for a network that minimizes an additive error function of statistically independent examples. We review first the class of single hidden-layer feedforward networks and characterize the learning process in such networks from a statistical point of view. Then we describe the backpropagation procedure, the leading case of gradient descent learning algorithms for the class of networks considered here, as well as an efficient heuristic modification. Finally, we analyze the applicability of these learning methods to the problem of predicting interregional telecommunication flows. Particular emphasis is laid on the engineering judgment, first, in choosing appropriate values for the tunable parameters, second, on the decision whether to train the network by epoch or by pattern (random approximation), and, third, on the overfitting problem. In addition, the analysis shows that the neural network model whether using either epoch-based or pattern-based stochastic approximation outperforms the classical regression approach to modeling telecommunication flows.

Journal ArticleDOI
TL;DR: A parallel approach by using neural network technique is proposed to help in the diagnosis of breast cancer by using feed forward neural network model and backpropagation learning algorithm with momentum and variable learning rate.
Abstract: Classification is perhaps the most familiar and popular data mining technique. Inspired by biological neural networks, Artificial Neural Networks are developed to mimic the characteristics such as robustness and fault tolerance. To perform classification task of medical data, the neural network is trained. To speed up the training process parallel approach is adopted. In this paper a parallel approach by using neural network technique is proposed to help in the diagnosis of breast cancer. The neural network is trained with breast cancer data base by using feed forward neural network model and backpropagation learning algorithm with momentum and variable learning rate. The performance of the network is evaluated. The experimental result shows that by applying parallel approach in neural network model yields efficient result.

Book ChapterDOI
15 Sep 2010
TL;DR: A novel unsupervised neural network combining elements from Adaptive Resonance Theory and topology learning neural networks, in particular the Self-Organising Incremental Neural Network, is introduced, which enables stable on-line clustering of stationary and non-stationary input data.
Abstract: In this paper, a novel unsupervised neural network combining elements from Adaptive Resonance Theory and topology learning neural networks, in particular the Self-Organising Incremental Neural Network, is introduced. It enables stable on-line clustering of stationary and non-stationary input data. In addition, two representations reflecting different levels of detail are learnt simultaneously. Furthermore, the network is designed in such a way that its sensitivity to noise is diminished, which renders it suitable for the application to real-world problems.

Journal ArticleDOI
TL;DR: Finite-time convergence of the proposed neural network is proved by using the Lyapunov method, which is remarkable and rare in the literature of neural networks for optimization.
Abstract: In this letter, a novel recurrent neural network based on the gradient method is proposed for solving linear programming problems. Finite-time convergence of the proposed neural network is proved by using the Lyapunov method. Compared with the existing neural networks for linear programming, the proposed neural network is globally convergent to exact optimal solutions in finite time, which is remarkable and rare in the literature of neural networks for optimization. Some numerical examples are given to show the effectiveness and excellent performance of the new recurrent neural network.

Proceedings Article
01 Jan 2010
TL;DR: This work investigates the learning behavior of training algorithms by varying minimal set of parameters and shows that with relatively simple variants of CD, it is possible to obtain good results even without further regularization.
Abstract: Restricted Boltzmann Machines are increasingly popular tools for unsupervised learning. They are very general, can cope with missing data and are used to pretrain deep learning machines. RBMs learn a generative model of the data distribution. As exact gradient ascent on the data likelihood is infeasible, typically Markov Chain Monte Carlo approximations to the gradient such as Contrastive Divergence (CD) are used. Even though there are some theoretical insights into this algorithm, it is not guaranteed to converge. Recently it has been observed that after an initial increase in likelihood, the training degrades, if no additional regularization is used. The parameters for regularization however cannot be determined even for medium-sized RBMs. In this work, we investigate the learning behavior of training algorithms by varying minimal set of parameters and show that with relatively simple variants of CD, it is possible to obtain good results even without further regularization. Furthermore, we show that it is not necessary to tune many hyperparameters to obtain a good model { nding

Journal ArticleDOI
TL;DR: The experimental results reveal that the proposed approach comes with a simpler structure of the classifier and better prediction capabilities.
Abstract: Polynomial neural networks have been known to exhibit useful properties as classifiers and universal approximators. In this study, we introduce a concept of polynomial-based radial basis function neural networks (P-RBF NNs), present a design methodology and show the use of the networks in classification problems. From the conceptual standpoint, the classifiers of this form can be expressed as a collection of "if-then" rules. The proposed architecture uses two essential development mechanisms. Fuzzy clustering (Fuzzy C-Means, FCM) is aimed at the development of condition parts of the rules while the corresponding conclusions of the rules are formed by some polynomials. A detailed learning algorithm for the P-RBF NNs is developed. The proposed classifier is applied to two-class pattern classification problems. The performance of this classifier is contrasted with the results produced by the "standard" RBF neural networks. In addition, the experimental application covers a comparative analysis including several previous commonly encountered methods such as standard neural networks, SVM, SOM, PCA, LDA, C4.5, and decision trees. The experimental results reveal that the proposed approach comes with a simpler structure of the classifier and better prediction capabilities.

Proceedings ArticleDOI
03 Dec 2010
TL;DR: Experiments show that DDBN outperforms most semi-supervised algorithm and deep learning techniques, especially for the hard classification tasks.
Abstract: This paper presents a novel semi-supervised learning algorithm called Discriminative Deep Belief Networks (DDBN), to address the image classification problem with limited labeled data. We first construct a new deep architecture for classification using a set of Restricted Boltzmann Machines (RBM). The parameter space of the deep architecture is initially determined using labeled data together with abundant of unlabeled data, by greedy layer-wise unsupervised learning. Then, we fine-tune the whole deep networks using an exponential loss function to maximize the separability of the labeled data, by gradient-descent based supervised learning. Experiments on the artificial dataset and real image datasets show that DDBN outperforms most semi-supervised algorithm and deep learning techniques, especially for the hard classification tasks.