scispace - formally typeset
Search or ask a question

Showing papers on "Artificial neural network published in 2010"


Proceedings Article
31 Mar 2010
TL;DR: The objective here is to understand better why standard gradient descent from random initialization is doing so poorly with deep neural networks, to better understand these recent relative successes and help design better algorithms in the future.
Abstract: Whereas before 2006 it appears that deep multilayer neural networks were not successfully trained, since then several algorithms have been shown to successfully train them, with experimental results showing the superiority of deeper vs less deep architectures. All these experimental results were obtained with new initialization or training mechanisms. Our objective here is to understand better why standard gradient descent from random initialization is doing so poorly with deep neural networks, to better understand these recent relative successes and help design better algorithms in the future. We first observe the influence of the non-linear activations functions. We find that the logistic sigmoid activation is unsuited for deep networks with random initialization because of its mean value, which can drive especially the top hidden layer into saturation. Surprisingly, we find that saturated units can move out of saturation by themselves, albeit slowly, and explaining the plateaus sometimes seen when training neural networks. We find that a new non-linearity that saturates less can often be beneficial. Finally, we study how activations and gradients vary across layers and during training, with the idea that training may be more difficult when the singular values of the Jacobian associated with each layer are far from 1. Based on these considerations, we propose a new initialization scheme that brings substantially faster convergence. 1 Deep Neural Networks Deep learning methods aim at learning feature hierarchies with features from higher levels of the hierarchy formed by the composition of lower level features. They include Appearing in Proceedings of the 13 International Conference on Artificial Intelligence and Statistics (AISTATS) 2010, Chia Laguna Resort, Sardinia, Italy. Volume 9 of JMLR: WC Weston et al., 2008). Much attention has recently been devoted to them (see (Bengio, 2009) for a review), because of their theoretical appeal, inspiration from biology and human cognition, and because of empirical success in vision (Ranzato et al., 2007; Larochelle et al., 2007; Vincent et al., 2008) and natural language processing (NLP) (Collobert & Weston, 2008; Mnih & Hinton, 2009). Theoretical results reviewed and discussed by Bengio (2009), suggest that in order to learn the kind of complicated functions that can represent high-level abstractions (e.g. in vision, language, and other AI-level tasks), one may need deep architectures. Most of the recent experimental results with deep architecture are obtained with models that can be turned into deep supervised neural networks, but with initialization or training schemes different from the classical feedforward neural networks (Rumelhart et al., 1986). Why are these new algorithms working so much better than the standard random initialization and gradient-based optimization of a supervised training criterion? Part of the answer may be found in recent analyses of the effect of unsupervised pretraining (Erhan et al., 2009), showing that it acts as a regularizer that initializes the parameters in a “better” basin of attraction of the optimization procedure, corresponding to an apparent local minimum associated with better generalization. But earlier work (Bengio et al., 2007) had shown that even a purely supervised but greedy layer-wise procedure would give better results. So here instead of focusing on what unsupervised pre-training or semi-supervised criteria bring to deep architectures, we focus on analyzing what may be going wrong with good old (but deep) multilayer neural networks. Our analysis is driven by investigative experiments to monitor activations (watching for saturation of hidden units) and gradients, across layers and across training iterations. We also evaluate the effects on these of choices of activation function (with the idea that it might affect saturation) and initialization procedure (since unsupervised pretraining is a particular form of initialization and it has a drastic impact).

9,500 citations


Book
01 Jan 2010
TL;DR: Refocused, revised and renamed to reflect the duality of neural networks and learning machines, this edition recognizes that the subject matter is richer when these topics are studied together.
Abstract: For graduate-level neural network courses offered in the departments of Computer Engineering, Electrical Engineering, and Computer Science. Neural Networks and Learning Machines, Third Edition is renowned for its thoroughness and readability. This well-organized and completely upto-date text remains the most comprehensive treatment of neural networks from an engineering perspective. This is ideal for professional engineers and research scientists. Matlab codes used for the computer experiments in the text are available for download at: http://www.pearsonhighered.com/haykin/ Refocused, revised and renamed to reflect the duality of neural networks and learning machines, this edition recognizes that the subject matter is richer when these topics are studied together. Ideas drawn from neural networks and machine learning are hybridized to perform improved learning tasks beyond the capability of either independently.

4,943 citations


Journal ArticleDOI
TL;DR: An overview of the mainstream deep learning approaches and research directions proposed over the past decade is provided and some perspective into how it may evolve is presented.
Abstract: This article provides an overview of the mainstream deep learning approaches and research directions proposed over the past decade. It is important to emphasize that each approach has strengths and "weaknesses, depending on the application and context in "which it is being used. Thus, this article presents a summary on the current state of the deep machine learning field and some perspective into how it may evolve. Convolutional Neural Networks (CNNs) and Deep Belief Networks (DBNs) (and their respective variations) are focused on primarily because they are well established in the deep learning field and show great promise for future work.

1,103 citations


Journal ArticleDOI
TL;DR: Good old online backpropagation for plain multilayer perceptrons yields a very low 0.35 error rate on the MNIST handwritten digits benchmark.
Abstract: Good old online backpropagation for plain multilayer perceptrons yields a very low 0.35% error rate on the MNIST handwritten digits benchmark. All we need to achieve this best result so far are many hidden layers, many neurons per layer, numerous deformed training images to avoid overfitting, and graphics cards to greatly speed up learning.

1,016 citations


Proceedings ArticleDOI
18 Jul 2010
TL;DR: This paper proposes the use of the neural networks with ensembles for pattern recognition problems that demands many thousands of classes to be recognized and gives a short description of this type of neural network and its storage capacity.
Abstract: Pattern recognition systems usually have a relatively small number of patterns to be recognized. As a rule the number of handwritten symbols, number of phonemes or number of human faces are of the order of some dozens. But sometimes the pattern recognition task demands much more classes. For example, a continuous speech recognition system can be created on the base of syllables; a handwriting recognition system will be more efficient if the recognized units are not different letters, but triplets of letters. In these cases it is necessary to have various thousands of classes. In this paper we will consider the situation of the recognition problem that demands many thousands of classes to be recognized. For such problems we propose the use of the neural networks with ensembles. We give a short description of this type of neural network and calculate its storage capacity.

877 citations


Journal ArticleDOI
TL;DR: This work has demonstrated experimentally the formation of associative memory in a simple neural network consisting of three electronic neurons connected by two memristor-emulator synapses and opens up new possibilities in the understanding of neural processes using memory devices.

840 citations


Journal ArticleDOI
TL;DR: A barrier Lyapunov function (BLF) is introduced to address two open and challenging problems in the neuro-control area: for any initial compact set, how to determine a priori the compact superset on which NN approximation is valid; and how to ensure that the arguments of the unknown functions remain within the specified compact supersets.
Abstract: In this brief, adaptive neural control is presented for a class of output feedback nonlinear systems in the presence of unknown functions. The unknown functions are handled via on-line neural network (NN) control using only output measurements. A barrier Lyapunov function (BLF) is introduced to address two open and challenging problems in the neuro-control area: 1) for any initial compact set, how to determine a priori the compact superset, on which NN approximation is valid; and 2) how to ensure that the arguments of the unknown functions remain within the specified compact superset. By ensuring boundedness of the BLF, we actively constrain the argument of the unknown functions to remain within a compact superset such that the NN approximation conditions hold. The semiglobal boundedness of all closed-loop signals is ensured, and the tracking error converges to a neighborhood of zero. Simulation results demonstrate the effectiveness of the proposed approach.

818 citations


Journal ArticleDOI
TL;DR: The proposed OP-ELM methodology performs several orders of magnitude faster than the other algorithms used in this brief, except the original ELM, and is still able to maintain an accuracy that is comparable to the performance of the SVM.
Abstract: In this brief, the optimally pruned extreme learning machine (OP-ELM) methodology is presented. It is based on the original extreme learning machine (ELM) algorithm with additional steps to make it more robust and generic. The whole methodology is presented in detail and then applied to several regression and classification problems. Results for both computational time and accuracy (mean square error) are compared to the original ELM and to three other widely used methodologies: multilayer perceptron (MLP), support vector machine (SVM), and Gaussian process (GP). As the experiments for both regression and classification illustrate, the proposed OP-ELM methodology performs several orders of magnitude faster than the other algorithms used in this brief, except the original ELM. Despite the simplicity and fast performance, the OP-ELM is still able to maintain an accuracy that is comparable to the performance of the SVM. A toolbox for the OP-ELM is publicly available online.

745 citations


Proceedings ArticleDOI
03 Aug 2010
TL;DR: An integrated software/hardware framework has been developed which is centered around a unified neural system description language, called PyNN, that allows the scientist to describe a model and execute it in a transparent fashion on either a neuromorphic hardware system or a numerical simulator.
Abstract: Modeling neural tissue is an important tool to investigate biological neural networks. Until recently, most of this modeling has been done using numerical methods. In the European research project "FACETS" this computational approach is complemented by different kinds of neuromorphic systems. A special emphasis lies in the usability of these systems for neuroscience. To accomplish this goal an integrated software/hardware framework has been developed which is centered around a unified neural system description language, called PyNN, that allows the scientist to describe a model and execute it in a transparent fashion on either a neuromorphic hardware system or a numerical simulator. A very large analog neuromorphic hardware system developed within FACETS is able to use complex neural models as well as realistic network topologies, i.e. it can realize more than 10000 synapses per neuron, to allow the direct execution of models which previously could have been simulated numerically only.

708 citations


Journal ArticleDOI
TL;DR: The empirical results with three well-known real data sets indicate that the proposed model can be an effective way to improve forecasting accuracy achieved by artificial neural networks, and can be used as an appropriate alternative model for forecasting task, especially when higher forecasting accuracy is needed.
Abstract: Artificial neural networks (ANNs) are flexible computing frameworks and universal approximators that can be applied to a wide range of time series forecasting problems with a high degree of accuracy. However, despite all advantages cited for artificial neural networks, their performance for some real time series is not satisfactory. Improving forecasting especially time series forecasting accuracy is an important yet often difficult task facing forecasters. Both theoretical and empirical findings have indicated that integration of different models can be an effective way of improving upon their predictive performance, especially when the models in the ensemble are quite different. In this paper, a novel hybrid model of artificial neural networks is proposed using auto-regressive integrated moving average (ARIMA) models in order to yield a more accurate forecasting model than artificial neural networks. The empirical results with three well-known real data sets indicate that the proposed model can be an effective way to improve forecasting accuracy achieved by artificial neural networks. Therefore, it can be used as an appropriate alternative model for forecasting task, especially when higher forecasting accuracy is needed.

663 citations


Journal ArticleDOI
TL;DR: This article presents a comprehensive overview of the hardware realizations of artificial neural network models, known as hardware neural networks (HNN), appearing in academic studies as prototypes as well as in commercial use.

Journal ArticleDOI
TL;DR: A comprehensive comparison study on the application of different artificial neural networks in 1-h-ahead wind speed forecasting shows that even for the same wind dataset, no single neural network model outperforms others universally in terms of all evaluation metrics.

Book ChapterDOI
01 Jan 2010
TL;DR: In this article, the assumption of Gaussianity for the measurement error combined with the maximum likelihood principle could be emphasized to promote the least square criterion for nonlinear regression problems; considering classification as a regression problem towards estimating class posterior probabilities, least squares has been employed to train neural network and other classifier topologies to approximate correct labels.
Abstract: INTRODUCTION Learning systems depend on three interrelated components: topologies, cost/performance functions, and learning algorithms. Topologies provide the constraints for the mapping, and the learning algorithms offer the means to find an optimal solution; but the solution is optimal with respect to what? Optimality is characterized by the criterion and in neural network literature, this is the least addressed component, yet it has a decisive influence in generalization performance. Certainly, the assumptions behind the selection of a criterion should be better understood and investigated. Traditionally, least squares has been the benchmark criterion for regression problems; considering classification as a regression problem towards estimating class posterior probabilities, least squares has been employed to train neural network and other classifier topologies to approximate correct labels. The main motivation to utilize least squares in regression simply comes from the intellectual comfort this criterion provides due to its success in traditional linear least squares regression applications – which can be reduced to solving a system of linear equations. For nonlinear regression, the assumption of Gaussianity for the measurement error combined with the maximum likelihood principle could be emphasized to promote this criterion. In nonparametric regression, least squares principle leads to the conditional expectation solution, which is intuitively appealing. Although these are good reasons to use the mean squared error as the cost, it is inherently linked to the assumptions and habits stated above. Consequently, there is information in the error signal that is not captured during the training of nonlinear adaptive systems under non-Gaussian distribution conditions when one insists on secondorder statistical criteria. This argument extends to other linear-second-order techniques such as principal component analysis (PCA), linear discriminant analysis (LDA), and canonical correlation analysis (CCA). Recent work tries to generalize these techniques to nonlinear scenarios by utilizing kernel techniques or other heuristics. This begs the question: what other alternative cost functions could be used to train adaptive systems and how could we establish rigorous techniques for extending useful concepts from linear and second-order statistical techniques to nonlinear and higher-order statistical learning methodologies?

Journal ArticleDOI
TL;DR: The paper gives a brief introduction to multi- layer perceptrons and resilient backpropagation and demonstrates the application of neuralnet using the data set infert, which is contained in the R distribution.
Abstract: Artificial neural networks are applied in many situations. neuralnet is built to train multi-layer perceptrons in the context of regres- sion analyses, i.e. to approximate functional rela- tionships between covariates and response vari- ables. Thus, neural networks are used as exten- sions of generalized linear models. neuralnet is a very flexible package. The back- propagation algorithm and three versions of re- silient backpropagation are implemented and it provides a custom-choice of activation and er- ror function. An arbitrary number of covariates and response variables as well as of hidden lay- ers can theoretically be included. The paper gives a brief introduction to multi- layer perceptrons and resilient backpropagation and demonstrates the application of neuralnet using the data set infert, which is contained in the R distribution.

Book ChapterDOI
05 Sep 2010
TL;DR: This work proposes detecting roads using a neural network with millions of trainable weights which looks at a much larger context than was used in previous attempts at learning the task, and shows that the method works reliably on two challenging urban datasets that are an order of magnitude larger than what was used to evaluate previous approaches.
Abstract: Reliably extracting information from aerial imagery is a difficult problem with many practical applications. One specific case of this problem is the task of automatically detecting roads. This task is a difficult vision problem because of occlusions, shadows, and a wide variety of non-road objects. Despite 30 years of work on automatic road detection, no automatic or semi-automatic road detection system is currently on the market and no published method has been shown to work reliably on large datasets of urban imagery. We propose detecting roads using a neural network with millions of trainable weights which looks at a much larger context than was used in previous attempts at learning the task. The network is trained on massive amounts of data using a consumer GPU. We demonstrate that predictive performance can be substantially improved by initializing the feature detectors using recently developed unsupervised learning methods as well as by taking advantage of the local spatial coherence of the output labels.We show that our method works reliably on two challenging urban datasets that are an order of magnitude larger than what was used to evaluate previous approaches.

Journal ArticleDOI
TL;DR: A large scale comparison study for the major machine learning models for time series forecasting, applying the models on the monthly M3 time series competition data to reveal significant differences between the different methods.
Abstract: In this work we present a large scale comparison study for the major machine learning models for time series forecasting. Specifically, we apply the models on the monthly M3 time series competition data (around a thousand time series). There have been very few, if any, large scale comparison studies for machine learning models for the regression or the time series forecasting problems, so we hope this study would fill this gap. The models considered are multilayer perceptron, Bayesian neural networks, radial basis functions, generalized regression neural networks (also called kernel regression), K-nearest neighbor regression, CART regression trees, support vector regression, and Gaussian processes. The study reveals significant differences between the different methods. The best two methods turned out to be the multilayer perceptron and the Gaussian process regression. In addition to model comparisons, we have tested different preprocessing methods and have shown that they have different impacts on the pe...

Journal ArticleDOI
TL;DR: A hybrid ARIMA and neural network model is proposed that is capable of exploiting the strengths of traditional time series approaches and artificial neural networks to provide a robust modeling framework capable of capturing the nonlinear nature of the complex time series and thus producing more accurate predictions.

Journal ArticleDOI
TL;DR: A model of supervised learning for biologically plausible neurons is presented that enables spiking neurons to reproduce arbitrary template spike patterns in response to given synaptic stimuli even in the presence of various sources of noise and shows that the learning rule can also be used for decision-making tasks.
Abstract: Learning from instructions or demonstrations is a fundamental property of our brain necessary to acquire new knowledge and develop novel skills or behavioral patterns. This type of learning is thought to be involved in most of our daily routines. Although the concept of instruction-based learning has been studied for several decades, the exact neural mechanisms implementing this process remain unrevealed. One of the central questions in this regard is, How do neurons learn to reproduce template signals (instructions) encoded in precisely timed sequences of spikes? Here we present a model of supervised learning for biologically plausible neurons that addresses this question. In a set of experiments, we demonstrate that our approach enables us to train spiking neurons to reproduce arbitrary template spike patterns in response to given synaptic stimuli even in the presence of various sources of noise. We show that the learning rule can also be used for decision-making tasks. Neurons can be trained to classify categories of input signals based on only a temporal configuration of spikes. The decision is communicated by emitting precisely timed spike trains associated with given input categories. Trained neurons can perform the classification task correctly even if stimuli and corresponding decision times are temporally separated and the relevant information is consequently highly overlapped by the ongoing neural activity. Finally, we demonstrate that neurons can be trained to reproduce sequences of spikes with a controllable time shift with respect to target templates. A reproduced signal can follow or even precede the targets. This surprising result points out that spiking neurons can potentially be applied to forecast the behavior (firing times) of other reference neurons or networks.

Journal ArticleDOI
TL;DR: This paper achieved the state-of-the-art performance on the MNIST handwritten digits benchmark by using a large number of hidden layers, many neurons per layer, numerous deformed training images, and graphics cards.
Abstract: Good old on-line back-propagation for plain multi-layer perceptrons yields a very low 0.35% error rate on the famous MNIST handwritten digits benchmark. All we need to achieve this best result so far are many hidden layers, many neurons per layer, numerous deformed training images, and graphics cards to greatly speed up learning.

Book
14 Oct 2010
TL;DR: This paper presents a meta-modelling framework for modeling Neural Networks without Training for Optimization that combines self-Organizing Maps and Unsupervised Classification.
Abstract: Neural Networks: An Overview.- Modeling with Neural Networks: Principles and Model Design Methodology.- Modeling Metholodgy: Dimension Reduction and Resampling Methods.- Neural Identification of Controlled Dynamical Systems and Recurrent Networks.- Closed-Loop Control Learning.- Discrimination.- Self-Organizing Maps and Unsupervised Classification.- Neural Networks without Training for Optimization.

Journal ArticleDOI
TL;DR: Bayesian decision making (BDM) results in the highest correct classification rate with relatively small computational cost, and a performance comparison of the classification techniques is provided in terms of their correct differentiation rates, confusion matrices, and computational cost.

Journal ArticleDOI
TL;DR: The improved computation presented in this paper is aimed to optimize the neural networks learning process using Levenberg-Marquardt (LM) algorithm, and the improved memory and time efficiencies are especially true for large sized patterns training.
Abstract: The improved computation presented in this paper is aimed to optimize the neural networks learning process using Levenberg-Marquardt (LM) algorithm. Quasi-Hessian matrix and gradient vector are computed directly, without Jacobian matrix multiplication and storage. The memory limitation problem for LM training is solved. Considering the symmetry of quasi-Hessian matrix, only elements in its upper/lower triangular array need to be calculated. Therefore, training speed is improved significantly, not only because of the smaller array stored in memory, but also the reduced operations in quasi-Hessian matrix calculation. The improved memory and time efficiencies are especially true for large sized patterns training.

Journal Article
TL;DR: This paper presents a mature, flexible, and adaptive machine learning toolkit for regression modeling and active learning to tackle issues of computational cost and model accuracy.
Abstract: An exceedingly large number of scientific and engineering fields are confronted with the need for computer simulations to study complex, real world phenomena or solve challenging design problems. However, due to the computational cost of these high fidelity simulations, the use of neural networks, kernel methods, and other surrogate modeling techniques have become indispensable. Surrogate models are compact and cheap to evaluate, and have proven very useful for tasks such as optimization, design space exploration, prototyping, and sensitivity analysis. Consequently, in many fields there is great interest in tools and techniques that facilitate the construction of such regression models, while minimizing the computational cost and maximizing model accuracy. This paper presents a mature, flexible, and adaptive machine learning toolkit for regression modeling and active learning to tackle these issues. The toolkit brings together algorithms for data fitting, model selection, sample selection (active learning), hyperparameter optimization, and distributed computing in order to empower a domain expert to efficiently generate an accurate model for the problem or data at hand.

Journal ArticleDOI
TL;DR: Experimental results on the KDD CUP 1999 dataset show that the proposed new approach, FC-ANN, outperforms BPNN and other well-known methods such as decision tree, the naive Bayes in terms of detection precision and detection stability.
Abstract: Many researches have argued that Artificial Neural Networks (ANNs) can improve the performance of intrusion detection systems (IDS) when compared with traditional methods. However for ANN-based IDS, detection precision, especially for low-frequent attacks, and detection stability are still needed to be enhanced. In this paper, we propose a new approach, called FC-ANN, based on ANN and fuzzy clustering, to solve the problem and help IDS achieve higher detection rate, less false positive rate and stronger stability. The general procedure of FC-ANN is as follows: firstly fuzzy clustering technique is used to generate different training subsets. Subsequently, based on different training subsets, different ANN models are trained to formulate different base models. Finally, a meta-learner, fuzzy aggregation module, is employed to aggregate these results. Experimental results on the KDD CUP 1999 dataset show that our proposed new approach, FC-ANN, outperforms BPNN and other well-known methods such as decision tree, the naive Bayes in terms of detection precision and detection stability.

Journal ArticleDOI
TL;DR: The experimental results show that this chaotic image encryption system with a perceptron model has high security, and strong resistance to the existing attack methods.
Abstract: Based on the high-dimension Lorenz chaotic system and perceptron model within a neural network, a chaotic image encryption system with a perceptron model is proposed. This paper describes the algorithm flow in detail, and analyses the cryptographic security. The experimental results show that this algorithm has high security, and strong resistance to the existing attack methods.

Journal ArticleDOI
01 Feb 2010-Energy
TL;DR: The proposed approach can be useful in the effective implementation of energy policies, since accurate predictions of energy consumption affect the capital investment, the environmental quality, the revenue analysis, the market research management, while conserve at the same time the supply security.

Proceedings ArticleDOI
13 Jun 2010
TL;DR: Experiments show that the supervised dictionary improves the performance of the proposed model significantly over the unsupervised dictionary, leading to state-of-the-art performance on diverse image databases and implying its great potential in handling large scale datasets in real applications.
Abstract: In this paper, we propose a novel supervised hierarchical sparse coding model based on local image descriptors for classification tasks. The supervised dictionary training is performed via back-projection, by minimizing the training error of classifying the image level features, which are extracted by max pooling over the sparse codes within a spatial pyramid. Such a max pooling procedure across multiple spatial scales offer the model translation invariant properties, similar to the Convolutional Neural Network (CNN). Experiments show that our supervised dictionary improves the performance of the proposed model significantly over the unsupervised dictionary, leading to state-of-the-art performance on diverse image databases. Further more, our supervised model targets learning linear features, implying its great potential in handling large scale datasets in real applications.

Journal ArticleDOI
TL;DR: New delay-dependent stability criteria for RNNs with time-varying delay are derived by applying this weighting-delay method, which are less conservative than previous results.
Abstract: In this paper, a weighting-delay-based method is developed for the study of the stability problem of a class of recurrent neural networks (RNNs) with time-varying delay. Different from previous results, the delay interval [0, d(t)] is divided into some variable subintervals by employing weighting delays. Thus, new delay-dependent stability criteria for RNNs with time-varying delay are derived by applying this weighting-delay method, which are less conservative than previous results. The proposed stability criteria depend on the positions of weighting delays in the interval [0, d(t)], which can be denoted by the weighting-delay parameters. Different weighting-delay parameters lead to different stability margins for a given system. Thus, a solution based on optimization methods is further given to calculate the optimal weighting-delay parameters. Several examples are provided to verify the effectiveness of the proposed criteria.

Proceedings Article
01 Sep 2010
TL;DR: This paper reports the recent exploration of the layer-by-layer learning strategy for training a multi-layer generative model of patches of speech spectrograms and shows that the binary codes learned produce a logspectral distortion that is approximately 2 dB lower than a subband vector quantization technique over the entire frequency range of wide-band speech.
Abstract: This paper reports our recent exploration of the layer-by-layer learning strategy for training a multi-layer generative model of patches of speech spectrograms. The top layer of the generative model learns binary codes that can be used for efficient compression of speech and could also be used for scalable speech recognition or rapid speech content retrieval. Each layer of the generative model is fully connected to the layer below and the weights on these connections are pretrained efficiently by using the contrastive divergence approximation to the log likelihood gradient. After layer-bylayer pre-training we “unroll” the generative model to form a deep auto-encoder, whose parameters are then fine-tuned using back-propagation. To reconstruct the full-length speech spectrogram, individual spectrogram segments predicted by their respective binary codes are combined using an overlapand-add method. Experimental results on speech spectrogram coding demonstrate that the binary codes produce a logspectral distortion that is approximately 2 dB lower than a subband vector quantization technique over the entire frequency range of wide-band speech. Index Terms: deep learning, speech feature extraction, neural networks, auto-encoder, binary codes, Boltzmann machine

Journal ArticleDOI
TL;DR: The results indicate that coupled wavelet-neural network models are a promising new method of short-term flow forecasting in non-perennial rivers in semi-arid watersheds such as those found in Cyprus.