Top 153 papers published in the topic of Deep learning in 2010

Proceedings Article•

Understanding the difficulty of training deep feedforward neural networks

[...]

Xavier Glorot¹, Yoshua Bengio¹•Institutions (1)

31 Mar 2010

TL;DR: The objective here is to understand better why standard gradient descent from random initialization is doing so poorly with deep neural networks, to better understand these recent relative successes and help design better algorithms in the future.

...read moreread less

Abstract: Whereas before 2006 it appears that deep multilayer neural networks were not successfully trained, since then several algorithms have been shown to successfully train them, with experimental results showing the superiority of deeper vs less deep architectures. All these experimental results were obtained with new initialization or training mechanisms. Our objective here is to understand better why standard gradient descent from random initialization is doing so poorly with deep neural networks, to better understand these recent relative successes and help design better algorithms in the future. We first observe the influence of the non-linear activations functions. We find that the logistic sigmoid activation is unsuited for deep networks with random initialization because of its mean value, which can drive especially the top hidden layer into saturation. Surprisingly, we find that saturated units can move out of saturation by themselves, albeit slowly, and explaining the plateaus sometimes seen when training neural networks. We find that a new non-linearity that saturates less can often be beneficial. Finally, we study how activations and gradients vary across layers and during training, with the idea that training may be more difficult when the singular values of the Jacobian associated with each layer are far from 1. Based on these considerations, we propose a new initialization scheme that brings substantially faster convergence. 1 Deep Neural Networks Deep learning methods aim at learning feature hierarchies with features from higher levels of the hierarchy formed by the composition of lower level features. They include Appearing in Proceedings of the 13 International Conference on Artificial Intelligence and Statistics (AISTATS) 2010, Chia Laguna Resort, Sardinia, Italy. Volume 9 of JMLR: WC Weston et al., 2008). Much attention has recently been devoted to them (see (Bengio, 2009) for a review), because of their theoretical appeal, inspiration from biology and human cognition, and because of empirical success in vision (Ranzato et al., 2007; Larochelle et al., 2007; Vincent et al., 2008) and natural language processing (NLP) (Collobert & Weston, 2008; Mnih & Hinton, 2009). Theoretical results reviewed and discussed by Bengio (2009), suggest that in order to learn the kind of complicated functions that can represent high-level abstractions (e.g. in vision, language, and other AI-level tasks), one may need deep architectures. Most of the recent experimental results with deep architecture are obtained with models that can be turned into deep supervised neural networks, but with initialization or training schemes different from the classical feedforward neural networks (Rumelhart et al., 1986). Why are these new algorithms working so much better than the standard random initialization and gradient-based optimization of a supervised training criterion? Part of the answer may be found in recent analyses of the effect of unsupervised pretraining (Erhan et al., 2009), showing that it acts as a regularizer that initializes the parameters in a “better” basin of attraction of the optimization procedure, corresponding to an apparent local minimum associated with better generalization. But earlier work (Bengio et al., 2007) had shown that even a purely supervised but greedy layer-wise procedure would give better results. So here instead of focusing on what unsupervised pre-training or semi-supervised criteria bring to deep architectures, we focus on analyzing what may be going wrong with good old (but deep) multilayer neural networks. Our analysis is driven by investigative experiments to monitor activations (watching for saturation of hidden units) and gradients, across layers and across training iterations. We also evaluate the effects on these of choices of activation function (with the idea that it might affect saturation) and initialization procedure (since unsupervised pretraining is a particular form of initialization and it has a drastic impact).

...read moreread less

9,500 citations

Journal Article•DOI•

Deep Machine Learning - A New Frontier in Artificial Intelligence Research [Research Frontier]

[...]

Itamar Arel¹, Derek C. Rose¹, Thomas P. Karnowski¹•Institutions (1)

University of Tennessee¹

01 Nov 2010-IEEE Computational Intelligence Magazine

TL;DR: An overview of the mainstream deep learning approaches and research directions proposed over the past decade is provided and some perspective into how it may evolve is presented.

...read moreread less

Abstract: This article provides an overview of the mainstream deep learning approaches and research directions proposed over the past decade. It is important to emphasize that each approach has strengths and "weaknesses, depending on the application and context in "which it is being used. Thus, this article presents a summary on the current state of the deep machine learning field and some perspective into how it may evolve. Convolutional Neural Networks (CNNs) and Deep Belief Networks (DBNs) (and their respective variations) are focused on primarily because they are well established in the deep learning field and show great promise for future work.

...read moreread less

1,103 citations

Proceedings Article•

Why Does Unsupervised Pre-training Help Deep Learning?

[...]

Dumitru Erhan¹, Aaron Courville¹, Yoshua Bengio¹, Pascal Vincent¹•Institutions (1)

Université de Montréal¹

31 Mar 2010

627 citations

Book•

Neural Networks: Methodology and Applications

[...]

Gérard Dreyfus

14 Oct 2010

TL;DR: This paper presents a meta-modelling framework for modeling Neural Networks without Training for Optimization that combines self-Organizing Maps and Unsupervised Classification.

...read moreread less

Abstract: Neural Networks: An Overview.- Modeling with Neural Networks: Principles and Model Design Methodology.- Modeling Metholodgy: Dimension Reduction and Resampling Methods.- Neural Identification of Controlled Dynamical Systems and Recurrent Networks.- Closed-Loop Control Learning.- Discrimination.- Self-Organizing Maps and Unsupervised Classification.- Neural Networks without Training for Optimization.

...read moreread less

519 citations

Proceedings Article•

Binary Coding of Speech Spectrograms Using a Deep Auto-encoder

[...]

Li Deng¹, Michael L. Seltzer¹, Dong Yu¹, Alex Acero¹, Abdelrahman Mohamed², Geoffrey E. Hinton² - Show less +2 more•Institutions (2)

Microsoft¹, University of Toronto²

01 Sep 2010

TL;DR: This paper reports the recent exploration of the layer-by-layer learning strategy for training a multi-layer generative model of patches of speech spectrograms and shows that the binary codes learned produce a logspectral distortion that is approximately 2 dB lower than a subband vector quantization technique over the entire frequency range of wide-band speech.

...read moreread less

Abstract: This paper reports our recent exploration of the layer-by-layer learning strategy for training a multi-layer generative model of patches of speech spectrograms. The top layer of the generative model learns binary codes that can be used for efficient compression of speech and could also be used for scalable speech recognition or rapid speech content retrieval. Each layer of the generative model is fully connected to the layer below and the weights on these connections are pretrained efficiently by using the contrastive divergence approximation to the log likelihood gradient. After layer-bylayer pre-training we “unroll” the generative model to form a deep auto-encoder, whose parameters are then fine-tuned using back-propagation. To reconstruct the full-length speech spectrogram, individual spectrogram segments predicted by their respective binary codes are combined using an overlapand-add method. Experimental results on speech spectrogram coding demonstrate that the binary codes produce a logspectral distortion that is approximately 2 dB lower than a subband vector quantization technique over the entire frequency range of wide-band speech. Index Terms: deep learning, speech feature extraction, neural networks, auto-encoder, binary codes, Boltzmann machine

...read moreread less

372 citations

Proceedings Article•DOI•

Deep auto-encoder neural networks in reinforcement learning

[...]

Sascha Lange¹, Martin Riedmiller¹•Institutions (1)

University of Freiburg¹

18 Jul 2010

TL;DR: A framework for combining the training of deep auto-encoders (for learning compact feature spaces) with recently-proposed batch-mode RL algorithms ( for learning policies) is proposed and an emphasis is put on the data-efficiency and on studying the properties of the feature spaces automatically constructed by the deep Auto-encoder neural networks.

...read moreread less

Abstract: This paper discusses the effectiveness of deep auto-encoder neural networks in visual reinforcement learning (RL) tasks. We propose a framework for combining the training of deep auto-encoders (for learning compact feature spaces) with recently-proposed batch-mode RL algorithms (for learning policies). An emphasis is put on the data-efficiency of this combination and on studying the properties of the feature spaces automatically constructed by the deep auto-encoders. These feature spaces are empirically shown to adequately resemble existing similarities and spatial relations between observations and allow to learn useful policies. We propose several methods for improving the topology of the feature spaces making use of task-dependent information. Finally, we present first results on successfully learning good control policies directly on synthesized and real images.

...read moreread less

353 citations

Journal Article•DOI•

Why Does Unsupervised Pre-training Help Deep Learning?

[...]

ErhanDumitru, BengioYoshua, CourvilleAaron, ManzagolPierre-Antoine, VincentPascal, BengioSamy - Show less +2 more

01 Mar 2010-Journal of Machine Learning Research

TL;DR: This paper presents a meta-modelling framework that automates the very labor-intensive and therefore time-heavy and therefore expensive process of designing and implementing deep neural networks for reinforcement learning.

...read moreread less

Abstract: Much recent research has been devoted to learning algorithms for deep architectures such as Deep Belief Networks and stacks of auto-encoder variants, with impressive results obtained in several are...

...read moreread less

259 citations

Proceedings Article•

Roles of Pre-Training and Fine-Tuning in Context-Dependent DBN-HMMs for Real-World Speech Recognition

[...]

Dong Yu¹, Li Deng¹, George E. Dahl²•Institutions (2)

Microsoft¹, University of Toronto²

01 Dec 2010

TL;DR: It is shown that pre-training can initialize weights to a point in the space where fine-tuning can be effective and thus is crucial in training deep structured models and in the recognition performance of a CD-DBN-HMM based large-vocabulary speech recognizer.

...read moreread less

Abstract: Recently, deep learning techniques have been successfully applied to automatic speech recognition tasks -first to phonetic recognition with context-independent deep belief network (DBN) hidden Markov models (HMMs) and later to large vocabulary continuous speech recognition using context-dependent (CD) DBN-HMMs. In this paper, we report our most recent experiments designed to understand the roles of the two main phases of the DBN learning -pre-training and fine tuning -in the recognition performance of a CD-DBN-HMM based large-vocabulary speech recognizer. As expected, we show that pre-training can initialize weights to a point in the space where fine-tuning can be effective and thus is crucial in training deep structured models. However, a moderate increase of the amount of unlabeled pre-training data has an insignificant effect on the final recognition results as long as the original training size is sufficiently large to initialize the DBN weights. On the other hand, with additional labeled training data, the fine-tuning phase of DBN training can significantly improve the recognition accuracy.

...read moreread less

235 citations

Journal Article•DOI•

Neural Network Learning Without Backpropagation

[...]

Bogdan M. Wilamowski¹, Hao Yu¹•Institutions (1)

Auburn University¹

01 Nov 2010-IEEE Transactions on Neural Networks

TL;DR: The method introduced in this paper allows for training arbitrarily connected neural networks, therefore, more powerful neural network architectures with connections across layers can be efficiently trained.

...read moreread less

Abstract: The method introduced in this paper allows for training arbitrarily connected neural networks, therefore, more powerful neural network architectures with connections across layers can be efficiently trained. The proposed method also simplifies neural network training, by using the forward-only computation instead of the traditionally used forward and backward computation.

...read moreread less

187 citations

Journal Article•DOI•

Deep belief networks are compact universal approximators

[...]

Nicolas Le Roux¹, Yoshua Bengio²•Institutions (2)

Microsoft¹, Université de Montréal²

01 Aug 2010-Neural Computation

TL;DR: It is proved that deep but narrow feedforward neural networks with sigmoidal units can represent any Boolean expression.

...read moreread less

Abstract: Deep belief networks (DBN) are generative models with many layers of hidden causal variables, recently introduced by Hinton, Osindero, and Teh (2006), along with a greedy layer-wise unsupervised learning algorithm. Building on Le Roux and Bengio (2008) and Sutskever and Hinton (2008), we show that deep but narrow generative networks do not require more parameters than shallow ones to achieve universal approximation. Exploiting the proof technique, we prove that deep but narrow feedforward neural networks with sigmoidal units can represent any Boolean expression.

...read moreread less

170 citations

Journal Article•

Understanding the difficulty of training deep feedforward neural networks

[...]

Xavier Glorot¹, Yoshua Bengio¹•Institutions (1)

Université de Montréal¹

01 Jan 2010-Journal of Machine Learning Research

TL;DR: In this article, the authors show that the logistic sigmoid activation is unsuited for deep networks with random initialization because of its mean value, which can drive especially the top hidden layer into saturation.

...read moreread less

Abstract: Whereas before 2006 it appears that deep multilayer neural networks were not successfully trained, since then several algorithms have been shown to successfully train them, with experimental results showing the superiority of deeper vs less deep architectures. All these experimental results were obtained with new initialization or training mechanisms. Our objective here is to understand better why standard gradient descent from random initialization is doing so poorly with deep neural networks, to better understand these recent relative successes and help design better algorithms in the future. We first observe the influence of the non-linear activations functions. We find that the logistic sigmoid activation is unsuited for deep networks with random initialization because of its mean value, which can drive especially the top hidden layer into saturation. Surprisingly, we find that saturated units can move out of saturation by themselves, albeit slowly, and explaining the plateaus sometimes seen when training neural networks. We find that a new non-linearity that saturates less can often be beneficial. Finally, we study how activations and gradients vary across layers and during training, with the idea that training may be more difficult when the singular values of the Jacobian associated with each layer are far from 1. Based on these considerations, we propose a new initialization scheme that brings substantially faster convergence. 1 Deep Neural Networks Deep learning methods aim at learning feature hierarchies with features from higher levels of the hierarchy formed by the composition of lower level features. They include Appearing in Proceedings of the 13 International Conference on Artificial Intelligence and Statistics (AISTATS) 2010, Chia Laguna Resort, Sardinia, Italy. Volume 9 of JMLR: WC Weston et al., 2008). Much attention has recently been devoted to them (see (Bengio, 2009) for a review), because of their theoretical appeal, inspiration from biology and human cognition, and because of empirical success in vision (Ranzato et al., 2007; Larochelle et al., 2007; Vincent et al., 2008) and natural language processing (NLP) (Collobert & Weston, 2008; Mnih & Hinton, 2009). Theoretical results reviewed and discussed by Bengio (2009), suggest that in order to learn the kind of complicated functions that can represent high-level abstractions (e.g. in vision, language, and other AI-level tasks), one may need deep architectures. Most of the recent experimental results with deep architecture are obtained with models that can be turned into deep supervised neural networks, but with initialization or training schemes different from the classical feedforward neural networks (Rumelhart et al., 1986). Why are these new algorithms working so much better than the standard random initialization and gradient-based optimization of a supervised training criterion? Part of the answer may be found in recent analyses of the effect of unsupervised pretraining (Erhan et al., 2009), showing that it acts as a regularizer that initializes the parameters in a “better” basin of attraction of the optimization procedure, corresponding to an apparent local minimum associated with better generalization. But earlier work (Bengio et al., 2007) had shown that even a purely supervised but greedy layer-wise procedure would give better results. So here instead of focusing on what unsupervised pre-training or semi-supervised criteria bring to deep architectures, we focus on analyzing what may be going wrong with good old (but deep) multilayer neural networks. Our analysis is driven by investigative experiments to monitor activations (watching for saturation of hidden units) and gradients, across layers and across training iterations. We also evaluate the effects on these of choices of activation function (with the idea that it might affect saturation) and initialization procedure (since unsupervised pretraining is a particular form of initialization and it has a drastic impact).

...read moreread less

Proceedings Article•

Active Deep Networks for Semi-Supervised Sentiment Classification

[...]

Shusen Zhou¹, Qingcai Chen¹, Xiaolong Wang¹•Institutions (1)

Harbin Institute of Technology¹

23 Aug 2010

TL;DR: Experiments show that ADN outperforms the semi-supervised learning algorithm and deep learning techniques applied for sentiment classification.

...read moreread less

Abstract: This paper presents a novel semi-supervised learning algorithm called Active Deep Networks (ADN), to address the semi-supervised sentiment classification problem with active learning. First, we propose the semi-supervised learning method of ADN. ADN is constructed by Restricted Boltzmann Machines (RBM) with unsupervised learning using labeled data and abundant of unlabeled data. Then the constructed structure is fine-tuned by gradient-descent based supervised learning with an exponential loss function. Second, we apply active learning in the semi-supervised learning framework to identify reviews that should be labeled as training data. Then ADN architecture is trained by the selected labeled data and all unlabeled data. Experiments on five sentiment classification datasets show that ADN outperforms the semi-supervised learning algorithm and deep learning techniques applied for sentiment classification.

...read moreread less

Journal Article•DOI•

The Random Neural Network

[...]

Stelios Timotheou¹•Institutions (1)

Imperial College London¹

01 Mar 2010-The Computer Journal

TL;DR: A review of the theory, extension models, learning algorithms and applications of the RNN, which has been applied in a variety of areas including pattern recognition, classification, image processing, combinatorial optimization and communication systems.

...read moreread less

Abstract: The random neural network (RNN) is a recurrent neural network model inspired by the spiking behaviour of biological neuronal networks. Contrary to most artificial neural network models, neurons in the RNN interact by probabilistically exchanging excitatory and inhibitory spiking signals. The model is described by analytical equations, has a low complexity supervised learning algorithm and is a universal approximator for bounded continuous functions. The RNN has been applied in a variety of areas including pattern recognition, classification, image processing, combinatorial optimization and communication systems. It has also inspired research activity in modelling interacting entities in various systems such as queueing and gene regulatory networks. This paper presents a review of the theory, extension models, learning algorithms and applications of the RNN.

...read moreread less

Journal Article•DOI•

Neural networks used for speech recognition

[...]

Wouter Gevaert¹, Georgi Tsenov², Valeri Mladenov²•Institutions (2)

University College West¹, Technical University of Sofia²

01 Jan 2010-Journal of Automatic Control

TL;DR: This investigation on the speech recognition classification performance is performed using two standard neural networks structures as the classifier using Feed-forward Neural Network with back propagation algorithm and a Radial Basis Functions Neural Networks.

...read moreread less

Abstract: In this paper is presented an investigation of the speech recognition classification performance. This investigation on the speech recognition classification performance is performed using two standard neural networks structures as the classifier. The utilized standard neural network types include Feed-forward Neural Network (NN) with back propagation algorithm and a Radial Basis Functions Neural Networks.

...read moreread less

Proceedings Article•DOI•

Improved Neural Network Based Language Modelling and Adaptation

[...]

J. Park¹, Xunying Liu¹, Mark J. F. Gales¹, Philip C. Woodland¹•Institutions (1)

University of Cambridge¹

26 Sep 2010

TL;DR: A novel NNLM adaptation method using a cascaded network is proposed and consistent WER reductions were obtained on a state-of-the-art Arabic LVCSR task over conventional NNLMs.

...read moreread less

Abstract: Neural network language models (NNLM) have become an increasingly popular choice for large vocabulary continuous speech recognition (LVCSR) tasks, due to their inherent generalisation and discriminative power. This paper present two techniques to improve performance of standard NNLMs. First, the form of NNLM is modelled by introduction an additional output layer node to model the probability mass of out-of-shortlist (OOS) words. An associated probability normalisation scheme is explicitly derived. Second, a novel NNLM adaptation method using a cascaded network is proposed. Consistent WER reductions were obtained on a state-of-the-art Arabic LVCSR task over conventional NNLMs. Further performance gains were also observed after NNLM adaptation.

...read moreread less

Journal Article•DOI•

Speech recognition with artificial neural networks

[...]

Gülin Dede¹, Murat H. Sazli¹•Institutions (1)

Ankara University¹

01 May 2010-Digital Signal Processing

TL;DR: Performance comparisons with similar studies found in the related literature indicated that the proposed ANN structures yield satisfactory results.

...read moreread less

Journal Article•DOI•

Composite function wavelet neural networks with extreme learning machine

[...]

Jiuwen Cao¹, Zhiping Lin¹, Guang-Bin Huang¹•Institutions (1)

Nanyang Technological University¹

01 Mar 2010-Neurocomputing

TL;DR: In the proposed wavelet neural networks, composite functions are applied at the hidden nodes and the learning is done using ELM, which can achieve better performances in most cases than some relevant neural networks and learn much faster than neural networks training with the traditional back-propagation (BP) algorithm.

...read moreread less

Journal Article•DOI•

A comparison of time series forecasting using support vector machine and artificial neural network model

[...]

Ruhaidah Samsudin, Ani Shabri, Puteh Saad

01 Nov 2010-Journal of Applied Sciences

TL;DR: The experiment shows that SVM outperforms the BP neural network based on the criteria of Mean Absolute Error (MAE), and indicates that S VM provides a promising technique in time series forecasting techniques.

...read moreread less

Abstract: Time series prediction is an important problem in many applications in natural science, engineering and economics. The objective of this study is to examine the flexibility of Support Vector Machine (SVM) in time series forecasting by comparing it with a multi-layer back-propagation (BP) neural network. Five well-known time series data sets are used in this study to demonstrate the effectiveness of the forecasting model. These data are utilized to forecast through an application aimed to handle real life time series. The grid search technique using 10-fold cross validation is used to determine the best value of SVM parameters in the forecasting process. The experiment shows that SVM outperforms the BP neural network based on the criteria of Mean Absolute Error (MAE). It also indicates that SVM provides a promising technique in time series forecasting techniques.

...read moreread less

DOI•

Deep Learning of Invariant Spatio-Temporal Features from Video

[...]

Bo Chen¹•Institutions (1)

University of British Columbia¹

01 Jan 2010

TL;DR: The ST-DBN has superior performance on discriminative and generative tasks including action recognition and video denoising when compared to convolutional deep belief networks applied on a per-frame basis and has superior feature invariance properties compared to CDBNs.

...read moreread less

Abstract: We present a novel hierarchical, distributed model for unsupervised learning of invariant spatio-temporal features from video. Our approach builds on previous deep learning methods and uses the convolutional Restricted Boltzmann machine (CRBM) as a basic processing unit. Our model, called the Space-Time Deep Belief Network (ST-DBN), alternates the aggregation of spatial and temporal information so that higher layers capture longer range statistical dependencies in both space and time. Our experiments show that the ST-DBN has superior performance on discriminative and generative tasks including action recognition and video denoising when compared to convolutional deep belief networks (CDBNs) applied on a per-frame basis. Simultaneously, the ST-DBN has superior feature invariance properties compared to CDBNs and can integrate information from both space and time to fill in missing data in video.

...read moreread less

Proceedings Article•

Deep Supervised t-Distributed Embedding

[...]

Martin Renqiang Min¹, Laurens van der Maaten², Laurens van der Maaten³, Zineng Yuan¹, Anthony J. Bonner¹, Zhaolei Zhang¹ - Show less +2 more•Institutions (3)

University of Toronto¹, University of California, San Diego², Delft University of Technology³

21 Jun 2010

TL;DR: This paper presents supervised embedding techniques that use a deep network to collapse classes, pre-trained using a stack of RBMs, and finetuned using approaches that try to collapsing classes.

...read moreread less

Abstract: Deep learning has been successfully applied to perform non-linear embedding. In this paper, we present supervised embedding techniques that use a deep network to collapse classes. The network is pre-trained using a stack of RBMs, and finetuned using approaches that try to collapse classes. The finetuning is inspired by ideas from NCA, but it uses a Student t-distribution to model the similarities of data points belonging to the same class in the embedding. We investigate two types of objective functions: deep t-distributed MCML (dt-MCML) and deep t-distributed NCA (dt-NCA). Our experiments on two handwritten digit data sets reveal the strong performance of dt-MCML in supervised parametric data visualization, whereas dt-NCA outperforms alternative techniques when embeddings with more than two or three dimensions are constructed, e.g., to obtain good classification performances. Overall, our results demonstrate the advantage of using a deep architecture and a heavy-tailed t-distribution for measuring pairwise similarities in supervised embedding.

...read moreread less

Journal Article•DOI•

Research on Extreme Learning of Neural Networks: Research on Extreme Learning of Neural Networks

[...]

Wan-Yu Deng, Qing-Hua Zheng, Lin Chen, Xue-Bin Xu

27 Apr 2010-Chinese Journal of Computers

Proceedings Article•

Parallel training of neural networks for speech recognition.

[...]

Karel Veselý¹, Lukas Burget¹, Frantisek Grezl•Institutions (1)

Brno University of Technology¹

01 Jan 2010

TL;DR: A new parallel-training tool TNet was designed and optimized for multiprocessor computers and the training acceleration rates are reported on a phoneme-state classification task.

...read moreread less

Abstract: The feed-forward multi-layer neural networks have significant importance in speech recognition. A new parallel-training tool TNet was designed and optimized for multiprocessor computers. The training acceleration rates are reported on a phoneme-state classification task.

...read moreread less

Journal Article•DOI•

A novel Bayesian learning method for information aggregation in modular neural networks

[...]

Pan Wang¹, Lida Xu¹, Shang-Ming Zhou², Zhun Fan³, Youfeng Li¹, Shan Feng⁴ - Show less +2 more•Institutions (4)

Wuhan University of Technology¹, Swansea University², Technical University of Denmark³, Huazhong University of Science and Technology⁴

01 Mar 2010-Expert Systems With Applications

TL;DR: A sequential Bayesian learning (SBL) is proposed for modular neural networks aiming at efficiently aggregating the outputs of members of the ensemble to demonstrate that the proposed method can perform information aggregation efficiently in data modeling.

...read moreread less

Abstract: Modular neural network is a popular neural network model which has many successful applications. In this paper, a sequential Bayesian learning (SBL) is proposed for modular neural networks aiming at efficiently aggregating the outputs of members of the ensemble. The experimental results on eight benchmark problems have demonstrated that the proposed method can perform information aggregation efficiently in data modeling.

...read moreread less

Journal Article•DOI•

Learning in Single Hidden-Layer Feedforward Network Models: Backpropagation in a Spatial Interaction Modeling Context

[...]

Sucharita Gopal, Manfred M. Fischer

07 Sep 2010-Geographical Analysis

TL;DR: This work describes the backpropagation procedure, the leading case of gradient descent learning algorithms for the class of networks considered here, as well as an efficient heuristic modification and examines the applicability of these learning methods to the problem of predicting interregional telecommunication flows.

...read moreread less

Abstract: Learning in neural networks has attracted considerable interest in recent years. Our focus is on learning in single hidden-layer feedforward networks which is posed as a search in the network parameter space for a network that minimizes an additive error function of statistically independent examples. We review first the class of single hidden-layer feedforward networks and characterize the learning process in such networks from a statistical point of view. Then we describe the backpropagation procedure, the leading case of gradient descent learning algorithms for the class of networks considered here, as well as an efficient heuristic modification. Finally, we analyze the applicability of these learning methods to the problem of predicting interregional telecommunication flows. Particular emphasis is laid on the engineering judgment, first, in choosing appropriate values for the tunable parameters, second, on the decision whether to train the network by epoch or by pattern (random approximation), and, third, on the overfitting problem. In addition, the analysis shows that the neural network model whether using either epoch-based or pattern-based stochastic approximation outperforms the classical regression approach to modeling telecommunication flows.

...read moreread less

Journal Article•DOI•

Parallel Approach for Diagnosis of Breast Cancer using Neural Network Technique

[...]

K R Usha Rani

10 Sep 2010-International Journal of Computer Applications

TL;DR: A parallel approach by using neural network technique is proposed to help in the diagnosis of breast cancer by using feed forward neural network model and backpropagation learning algorithm with momentum and variable learning rate.

...read moreread less

Abstract: Classification is perhaps the most familiar and popular data mining technique. Inspired by biological neural networks, Artificial Neural Networks are developed to mimic the characteristics such as robustness and fault tolerance. To perform classification task of medical data, the neural network is trained. To speed up the training process parallel approach is adopted. In this paper a parallel approach by using neural network technique is proposed to help in the diagnosis of breast cancer. The neural network is trained with breast cancer data base by using feed forward neural network model and backpropagation learning algorithm with momentum and variable learning rate. The performance of the network is evaluated. The experimental result shows that by applying parallel approach in neural network model yields efficient result.

...read moreread less

Book Chapter•DOI•

TopoART: a topology learning hierarchical ART network

[...]

Marko Tscherepanow¹•Institutions (1)

Bielefeld University¹

15 Sep 2010

TL;DR: A novel unsupervised neural network combining elements from Adaptive Resonance Theory and topology learning neural networks, in particular the Self-Organising Incremental Neural Network, is introduced, which enables stable on-line clustering of stationary and non-stationary input data.

...read moreread less

Abstract: In this paper, a novel unsupervised neural network combining elements from Adaptive Resonance Theory and topology learning neural networks, in particular the Self-Organising Incremental Neural Network, is introduced. It enables stable on-line clustering of stationary and non-stationary input data. In addition, two representations reflecting different levels of detail are learnt simultaneously. Furthermore, the network is designed in such a way that its sensitivity to noise is diminished, which renders it suitable for the application to real-world problems.

...read moreread less

Journal Article•DOI•

A novel recurrent neural network with finite-time convergence for linear programming

[...]

Qingshan Liu¹, Jinde Cao¹, Guanrong Chen²•Institutions (2)

Southeast University¹, City University of Hong Kong²

01 Nov 2010-Neural Computation

TL;DR: Finite-time convergence of the proposed neural network is proved by using the Lyapunov method, which is remarkable and rare in the literature of neural networks for optimization.

...read moreread less

Abstract: In this letter, a novel recurrent neural network based on the gradient method is proposed for solving linear programming problems. Finite-time convergence of the proposed neural network is proved by using the Lyapunov method. Compared with the existing neural networks for linear programming, the proposed neural network is globally convergent to exact optimal solutions in finite time, which is remarkable and rare in the literature of neural networks for optimization. Some numerical examples are given to show the effectiveness and excellent performance of the new recurrent neural network.

...read moreread less

Proceedings Article•

Investigating Convergence of Restricted Boltzmann Machine Learning

[...]

Hannes Schulz¹, Andreas Müller¹, Sven Behnke¹•Institutions (1)

University of Bonn¹

01 Jan 2010

TL;DR: This work investigates the learning behavior of training algorithms by varying minimal set of parameters and shows that with relatively simple variants of CD, it is possible to obtain good results even without further regularization.

...read moreread less

Abstract: Restricted Boltzmann Machines are increasingly popular tools for unsupervised learning. They are very general, can cope with missing data and are used to pretrain deep learning machines. RBMs learn a generative model of the data distribution. As exact gradient ascent on the data likelihood is infeasible, typically Markov Chain Monte Carlo approximations to the gradient such as Contrastive Divergence (CD) are used. Even though there are some theoretical insights into this algorithm, it is not guaranteed to converge. Recently it has been observed that after an initial increase in likelihood, the training degrades, if no additional regularization is used. The parameters for regularization however cannot be determined even for medium-sized RBMs. In this work, we investigate the learning behavior of training algorithms by varying minimal set of parameters and show that with relatively simple variants of CD, it is possible to obtain good results even without further regularization. Furthermore, we show that it is not necessary to tune many hyperparameters to obtain a good model { nding

...read moreread less

Journal Article•DOI•

Polynomial-based radial basis function neural networks (P-RBF NNs) and their application to pattern classification

[...]

Byoung-Jun Park¹, Witold Pedrycz², Sung-Kwun Oh³•Institutions (3)

Electronics and Telecommunications Research Institute¹, University of Alberta², University of Suwon³

01 Feb 2010-Applied Intelligence

TL;DR: The experimental results reveal that the proposed approach comes with a simpler structure of the classifier and better prediction capabilities.

...read moreread less

Abstract: Polynomial neural networks have been known to exhibit useful properties as classifiers and universal approximators. In this study, we introduce a concept of polynomial-based radial basis function neural networks (P-RBF NNs), present a design methodology and show the use of the networks in classification problems. From the conceptual standpoint, the classifiers of this form can be expressed as a collection of "if-then" rules. The proposed architecture uses two essential development mechanisms. Fuzzy clustering (Fuzzy C-Means, FCM) is aimed at the development of condition parts of the rules while the corresponding conclusions of the rules are formed by some polynomials. A detailed learning algorithm for the P-RBF NNs is developed. The proposed classifier is applied to two-class pattern classification problems. The performance of this classifier is contrasted with the results produced by the "standard" RBF neural networks. In addition, the experimental application covers a comparative analysis including several previous commonly encountered methods such as standard neural networks, SVM, SOM, PCA, LDA, C4.5, and decision trees. The experimental results reveal that the proposed approach comes with a simpler structure of the classifier and better prediction capabilities.

...read moreread less

Proceedings Article•DOI•

Discriminative Deep Belief Networks for image classification

[...]

Shusen Zhou¹, Qingcai Chen¹, Xiaolong Wang¹•Institutions (1)

Harbin Institute of Technology¹

03 Dec 2010

TL;DR: Experiments show that DDBN outperforms most semi-supervised algorithm and deep learning techniques, especially for the hard classification tasks.

...read moreread less

Abstract: This paper presents a novel semi-supervised learning algorithm called Discriminative Deep Belief Networks (DDBN), to address the image classification problem with limited labeled data. We first construct a new deep architecture for classification using a set of Restricted Boltzmann Machines (RBM). The parameter space of the deep architecture is initially determined using labeled data together with abundant of unlabeled data, by greedy layer-wise unsupervised learning. Then, we fine-tune the whole deep networks using an exponential loss function to maximize the separability of the labeled data, by gradient-descent based supervised learning. Experiments on the artificial dataset and real image datasets show that DDBN outperforms most semi-supervised algorithm and deep learning techniques, especially for the hard classification tasks.

...read moreread less

Showing papers on "Deep learning published in 2010"