Showing papers on "Recurrent neural network published in 2012"

PDF

Open Access

Posted Content•

On the difficulty of training Recurrent Neural Networks

[...]

Razvan Pascanu¹, Tomas Mikolov², Yoshua Bengio¹•Institutions (2)

Université de Montréal¹, Brno University of Technology²

21 Nov 2012-arXiv: Learning

TL;DR: This paper proposes a gradient norm clipping strategy to deal with exploding gradients and a soft constraint for the vanishing gradients problem and validates empirically the hypothesis and proposed solutions.

...read moreread less

Abstract: There are two widely known issues with properly training Recurrent Neural Networks, the vanishing and the exploding gradient problems detailed in Bengio et al. (1994). In this paper we attempt to improve the understanding of the underlying issues by exploring these problems from an analytical, a geometric and a dynamical systems perspective. Our analysis is used to justify a simple yet effective solution. We propose a gradient norm clipping strategy to deal with exploding gradients and a soft constraint for the vanishing gradients problem. We validate empirically our hypothesis and proposed solutions in the experimental section.

...read moreread less

3,549 citations

Book•

Supervised Sequence Labelling with Recurrent Neural Networks

[...]

Alex Graves

09 Feb 2012

TL;DR: A new type of output layer that allows recurrent networks to be trained directly for sequence labelling tasks where the alignment between the inputs and the labels is unknown, and an extension of the long short-term memory network architecture to multidimensional data, such as images and video sequences.

...read moreread less

Abstract: Recurrent neural networks are powerful sequence learners. They are able to incorporate context information in a flexible way, and are robust to localised distortions of the input data. These properties make them well suited to sequence labelling, where input sequences are transcribed with streams of labels. The aim of this thesis is to advance the state-of-the-art in supervised sequence labelling with recurrent networks. Its two main contributions are (1) a new type of output layer that allows recurrent networks to be trained directly for sequence labelling tasks where the alignment between the inputs and the labels is unknown, and (2) an extension of the long short-term memory network architecture to multidimensional data, such as images and video sequences.

...read moreread less

2,101 citations

Proceedings Article•DOI•

LSTM Neural Networks for Language Modeling.

[...]

Martin Sundermeyer¹, Ralf Schlüter¹, Hermann Ney¹•Institutions (1)

RWTH Aachen University¹

01 Jan 2012

TL;DR: This work analyzes the Long Short-Term Memory neural network architecture on an English and a large French language modeling task and gains considerable improvements in WER on top of a state-of-the-art speech recognition system.

...read moreread less

Abstract: Neural networks have become increasingly popular for the task of language modeling. Whereas feed-forward networks only exploit a fixed context length to predict the next word of a sequence, conceptually, standard recurrent neural networks can take into account all of the predecessor words. On the other hand, it is well known that recurrent networks are difficult to train and therefore are unlikely to show the full potential of recurrent models. These problems are addressed by a the Long Short-Term Memory neural network architecture. In this work, we analyze this type of network on an English and a large French language modeling task. Experiments show improvements of about 8 % relative in perplexity over standard recurrent neural network LMs. In addition, we gain considerable improvements in WER on top of a state-of-the-art speech recognition system.

...read moreread less

1,966 citations

Posted Content•

Sequence Transduction with Recurrent Neural Networks

[...]

Alex Graves¹•Institutions (1)

University of Toronto¹

14 Nov 2012-arXiv: Neural and Evolutionary Computing

TL;DR: This paper introduces an end-to-end, probabilistic sequence transduction system, based entirely on RNNs, that is in principle able to transform any input sequence into any finite, discrete output sequence.

...read moreread less

Abstract: Many machine learning tasks can be expressed as the transformation---or \emph{transduction}---of input sequences into output sequences: speech recognition, machine translation, protein secondary structure prediction and text-to-speech to name but a few. One of the key challenges in sequence transduction is learning to represent both the input and output sequences in a way that is invariant to sequential distortions such as shrinking, stretching and translating. Recurrent neural networks (RNNs) are a powerful sequence learning architecture that has proven capable of learning such representations. However RNNs traditionally require a pre-defined alignment between the input and output sequences to perform transduction. This is a severe limitation since \emph{finding} the alignment is the most difficult aspect of many sequence transduction problems. Indeed, even determining the length of the output sequence is often challenging. This paper introduces an end-to-end, probabilistic sequence transduction system, based entirely on RNNs, that is in principle able to transform any input sequence into any finite, discrete output sequence. Experimental results for phoneme recognition are provided on the TIMIT speech corpus.

...read moreread less

1,448 citations

Book Chapter•DOI•

A Practical Guide to Applying Echo State Networks

[...]

Mantas Lukoševičius¹•Institutions (1)

Jacobs University Bremen¹

01 Jan 2012

TL;DR: Practical techniques and recommendations for successfully applying Echo State Network, as well as some more advanced application-specific modifications are presented.

...read moreread less

Abstract: Reservoir computing has emerged in the last decade as an alternative to gradient descent methods for training recurrent neural networks. Echo State Network (ESN) is one of the key reservoir computing “flavors”. While being practical, conceptually simple, and easy to implement, ESNs require some experience and insight to achieve the hailed good performance in many tasks. Here we present practical techniques and recommendations for successfully applying ESNs, as well as some more advanced application-specific modifications.

...read moreread less

653 citations

Proceedings Article•DOI•

Context dependent recurrent neural network language model

[...]

Tomas Mikolov¹, Geoffrey Zweig²•Institutions (2)

Brno University of Technology¹, Microsoft²

01 Dec 2012

TL;DR: This paper improves recurrent neural network language models performance by providing a contextual real-valued input vector in association with each word to convey contextual information about the sentence being modeled by performing Latent Dirichlet Allocation using a block of preceding text.

...read moreread less

Abstract: Recurrent neural network language models (RNNLMs) have recently demonstrated state-of-the-art performance across a variety of tasks. In this paper, we improve their performance by providing a contextual real-valued input vector in association with each word. This vector is used to convey contextual information about the sentence being modeled. By performing Latent Dirichlet Allocation using a block of preceding text, we achieve a topic-conditioned RNNLM. This approach has the key advantage of avoiding the data fragmentation associated with building multiple topic models on different data subsets. We report perplexity results on the Penn Treebank data, where we achieve a new state-of-the-art. We further apply the model to the Wall Street Journal speech recognition task, where we observe improvements in word-error-rate.

...read moreread less

644 citations

Posted Content•

Modeling Temporal Dependencies in High-Dimensional Sequences: Application to Polyphonic Music Generation and Transcription

[...]

Nicolas Boulanger-Lewandowski¹, Yoshua Bengio¹, Pascal Vincent¹•Institutions (1)

Université de Montréal¹

27 Jun 2012-arXiv: Learning

TL;DR: In this paper, a probabilistic model based on distribution estimators conditioned on a recurrent neural network is proposed to discover temporal dependencies in high-dimensional sequences of polyphonic music.

...read moreread less

Abstract: We investigate the problem of modeling symbolic sequences of polyphonic music in a completely general piano-roll representation. We introduce a probabilistic model based on distribution estimators conditioned on a recurrent neural network that is able to discover temporal dependencies in high-dimensional sequences. Our approach outperforms many traditional models of polyphonic music on a variety of realistic datasets. We show how our musical language model can serve as a symbolic prior to improve the accuracy of polyphonic transcription.

...read moreread less

615 citations

Book Chapter•DOI•

Long Short-Term Memory

[...]

Alex Graves¹•Institutions (1)

University of Toronto¹

01 Jan 2012

TL;DR: This chapter discusses recurrent neural networks' ability to use contextual information when mapping between input and output sequences, and discusses the vanishing gradient problem, which affects this ability.

...read moreread less

Abstract: As discussed in the previous chapter, an important benefit of recurrent neural networks is their ability to use contextual information when mapping between input and output sequences. Unfortunately, for standard RNN architectures, the range of context that can be in practice accessed is quite limited. The problem is that the influence of a given input on the hidden layer, and therefore on the network output, either decays or blows up exponentially as it cycles around the network’s recurrent connections. This effect is often referred to in the literature as the vanishing gradient problem (Hochreiter, 1991; Hochreiter et al., 2001a; Bengio et al., 1994). The vanishing gradient problem is illustrated schematically in Figure 4.1

...read moreread less

529 citations

Posted Content•

Understanding the exploding gradient problem

[...]

Razvan Pascanu, Tomas Mikolov, Yoshua Bengio

21 Nov 2012

TL;DR: The analysis is used to justify the simple yet effective solution of norm clipping the exploded gradient, and the comparison between this heuristic solution and standard SGD provides empirical evidence towards the hypothesis that such a heuristic is required to reach state of the art results on a character prediction task and a polyphonic music prediction one.

...read moreread less

Abstract: Training Recurrent Neural Networks is more troublesome than feedforward ones because of the vanishing and exploding gradient problems detailed in Bengio et al. (1994). In this paper we attempt to understand the fundamental issues underlying the exploding gradient problem by exploring it from an analytical, a geometric and a dynamical system perspective. Our analysis is used to justify the simple yet effective solution of norm clipping the exploded gradient. In the experimental section, the comparison between this heuristic solution and standard SGD provides empirical evidence towards our hypothesis as well as it shows that such a heuristic is required to reach state of the art results on a character prediction task and a polyphonic music prediction one.

...read moreread less

483 citations

Posted Content•

Advances in Optimizing Recurrent Networks

[...]

Yoshua Bengio, Nicolas Boulanger-Lewandowski, Razvan Pascanu

04 Dec 2012-arXiv: Learning

TL;DR: Experiments reported here evaluate the use of clipping gradients, spanning longer time ranges with leaky integration, advanced momentum techniques, using more powerful output probability models, and encouraging sparser gradients to help symmetry breaking and credit assignment.

...read moreread less

Abstract: After a more than decade-long period of relatively little research activity in the area of recurrent neural networks, several new developments will be reviewed here that have allowed substantial progress both in understanding and in technical solutions towards more efficient training of recurrent networks. These advances have been motivated by and related to the optimization issues surrounding deep learning. Although recurrent networks are extremely powerful in what they can in principle represent in terms of modelling sequences,their training is plagued by two aspects of the same issue regarding the learning of long-term dependencies. Experiments reported here evaluate the use of clipping gradients, spanning longer time ranges with leaky integration, advanced momentum techniques, using more powerful output probability models, and encouraging sparser gradients to help symmetry breaking and credit assignment. The experiments are performed on text and music data and show off the combined effects of these techniques in generally improving both training and test error.

...read moreread less

394 citations

Journal Article•DOI•

Synchronization control of a class of memristor-based recurrent neural networks

[...]

Ailong Wu¹, Shiping Wen¹, Zhigang Zeng¹•Institutions (1)

Huazhong University of Science and Technology¹

01 Jan 2012-Information Sciences

TL;DR: Some sufficient conditions are obtained to guarantee the exponential synchronization of the coupled networks based on drive-response concept, differential inclusions theory and Lyapunov functional method.

...read moreread less

Journal Article•DOI•

Reservoir Computing Trends

[...]

Mantas Lukoševičius¹, Herbert Jaeger¹, Benjamin Schrauwen²•Institutions (2)

Jacobs University Bremen¹, Ghent University²

16 May 2012-Künstliche Intelligenz

TL;DR: A brief introduction into basic concepts, methods, insights, current developments, and some applications of RC are given.

...read moreread less

Abstract: Reservoir Computing (RC) is a paradigm of understanding and training Recurrent Neural Networks (RNNs) based on treating the recurrent part (the reservoir) differently than the readouts from it. It started ten years ago and is currently a prolific research area, giving important insights into RNNs, practical machine learning tools, as well as enabling computation with non-conventional hardware. Here we give a brief introduction into basic concepts, methods, insights, current developments, and highlight some applications of RC.

...read moreread less

Proceedings Article•

Recurrent Neural Networks for Noise Reduction in Robust ASR

[...]

Andrew L. Maas¹, Quoc V. Le¹, Tyler M. O'Neil¹, Oriol Vinyals², Patrick Nguyen³, Andrew Y. Ng¹ - Show less +2 more•Institutions (3)

Stanford University¹, University of California, Berkeley², Google³

01 Jan 2012

TL;DR: This work introduces a model which uses a deep recurrent auto encoder neural network to denoise input features for robust ASR, and demonstrates the model is competitive with existing feature denoising approaches on the Aurora2 task, and outperforms a tandem approach where deep networks are used to predict phoneme posteriors directly.

...read moreread less

Abstract: Recent work on deep neural networks as acoustic models for automatic speech recognition (ASR) have demonstrated substantial performance improvements. We introduce a model which uses a deep recurrent auto encoder neural network to denoise input features for robust ASR. The model is trained on stereo (noisy and clean) audio features to predict clean features given noisy input. The model makes no assumptions about how noise affects the signal, nor the existence of distinct noise environments. Instead, the model can learn to model any type of distortion or additive noise given sufficient training data. We demonstrate the model is competitive with existing feature denoising approaches on the Aurora2 task, and outperforms a tandem approach where deep networks are used to predict phoneme posteriors directly.

...read moreread less

Journal Article•DOI•

Global Stability of Complex-Valued Recurrent Neural Networks With Time-Delays

[...]

Jin Hu¹, Jun Wang¹•Institutions (1)

The Chinese University of Hong Kong¹

03 May 2012-IEEE Transactions on Neural Networks

TL;DR: Several sufficient conditions derived are presented to ascertain the existence of unique equilibrium, global asymptotic stability, and global exponential stability of delayedcomplex-valued recurrent neural networks with two classes of complex-valued activation functions.

...read moreread less

Abstract: Since the last decade, several complex-valued neural networks have been developed and applied in various research areas. As an extension of real-valued recurrent neural networks, complex-valued recurrent neural networks use complex-valued states, connection weights, or activation functions with much more complicated properties than real-valued ones. This paper presents several sufficient conditions derived to ascertain the existence of unique equilibrium, global asymptotic stability, and global exponential stability of delayed complex-valued recurrent neural networks with two classes of complex-valued activation functions. Simulation results of three numerical examples are also delineated to substantiate the effectiveness of the theoretical results.

...read moreread less

Journal Article•DOI•

Information Processing Capacity of Dynamical Systems

[...]

Joni Dambre¹, David Verstraeten¹, Benjamin Schrauwen¹, Serge Massar²•Institutions (2)

Ghent University¹, Université libre de Bruxelles²

19 Jul 2012-Scientific Reports

TL;DR: The theory combines concepts from machine learning (reservoir computing), system modeling, stochastic processes, and functional analysis to define the computational capacity of a dynamical system.

...read moreread less

Abstract: Many dynamical systems, both natural and artificial, are stimulated by time dependent external signals, somehow processing the information contained therein. We demonstrate how to quantify the different modes in which information can be processed by such systems and combine them to define the computational capacity of a dynamical system. This is bounded by the number of linearly independent state variables of the dynamical system, equaling it if the system obeys the fading memory condition. It can be interpreted as the total number of linearly independent functions of its stimuli the system can compute. Our theory combines concepts from machine learning (reservoir computing), system modeling, stochastic processes, and functional analysis. We illustrate our theory by numerical simulations for the logistic map, a recurrent neural network, and a two-dimensional reaction diffusion system, uncovering universal trade-offs between the non-linearity of the computation and the system's short-term memory.

...read moreread less

Journal Article•DOI•

Chaotic Time Series Prediction Based on a Novel Robust Echo State Network

[...]

Decai Li¹, Min Han¹, Jun Wang²•Institutions (2)

Dalian University of Technology¹, The Chinese University of Hong Kong²

03 Apr 2012-IEEE Transactions on Neural Networks

TL;DR: A robust recurrent neural network is presented in a Bayesian framework based on echo state mechanisms that is robust in the presence of outliers and is superior to existing methods.

...read moreread less

Abstract: In this paper, a robust recurrent neural network is presented in a Bayesian framework based on echo state mechanisms. Since the new model is capable of handling outliers in the training data set, it is termed as a robust echo state network (RESN). The RESN inherits the basic idea of ESN learning in a Bayesian framework, but replaces the commonly used Gaussian distribution with a Laplace one, which is more robust to outliers, as the likelihood function of the model output. Moreover, the training of the RESN is facilitated by employing a bound optimization algorithm, based on which, a proper surrogate function is derived and the Laplace likelihood function is approximated by a Gaussian one, while remaining robust to outliers. It leads to an efficient method for estimating model parameters, which can be solved by using a Bayesian evidence procedure in a fully autonomous way. Experimental results show that the proposed method is robust in the presence of outliers and is superior to existing methods.

...read moreread less

Journal Article•DOI•

A Novel Word Spotting Method Based on Recurrent Neural Networks

[...]

Volkmar Frinken¹, Andreas Fischer¹, R. Manmatha², Horst Bunke¹•Institutions (2)

University of Bern¹, University of Massachusetts Amherst²

01 Feb 2012-IEEE Transactions on Pattern Analysis and Machine Intelligence

TL;DR: A novel keyword spotting method for handwritten documents is described, derived from a neural network-based system for unconstrained handwriting recognition, that performs template-free spotting, i.e., it is not necessary for a keyword to appear in the training set.

...read moreread less

Abstract: Keyword spotting refers to the process of retrieving all instances of a given keyword from a document. In the present paper, a novel keyword spotting method for handwritten documents is described. It is derived from a neural network-based system for unconstrained handwriting recognition. As such it performs template-free spotting, i.e., it is not necessary for a keyword to appear in the training set. The keyword spotting is done using a modification of the CTC Token Passing algorithm in conjunction with a recurrent neural network. We demonstrate that the proposed systems outperform not only a classical dynamic time warping-based approach but also a modern keyword spotting system, based on hidden Markov models. Furthermore, we analyze the performance of the underlying neural networks when using them in a recognition task followed by keyword spotting on the produced transcription. We point out the advantages of keyword spotting when compared to classic text line recognition.

...read moreread less

Journal Article•DOI•

Information processing in echo state networks at the edge of chaos

[...]

Joschka Boedecker¹, Oliver Obst², Oliver Obst³, Joseph T. Lizier², Joseph T. Lizier³, N. Michael Mayer⁴, Minoru Asada¹ - Show less +3 more•Institutions (4)

Osaka University¹, University of Sydney², Commonwealth Scientific and Industrial Research Organisation³, National Chung Cheng University⁴

01 Sep 2012-Theory in Biosciences

TL;DR: Evidence is presented that both information transfer and storage in the recurrent layer are maximized close to this phase transition, providing an explanation for why guiding the recurrent layers toward the edge of chaos is computationally useful.

...read moreread less

Book Chapter•DOI•

Training Deep and Recurrent Networks with Hessian-Free Optimization

[...]

James Martens¹, Ilya Sutskever¹•Institutions (1)

University of Toronto¹

01 Jan 2012

TL;DR: This chapter describes the basic HF approach, and examines well-known performance-improving techniques such as preconditioning which have been beneficial for neural network training and others of a more heuristic nature which are harder to justify, but which have found to work well in practice.

...read moreread less

Abstract: In this chapter we will first describe the basic HF approach, and then examine well-known performance-improving techniques such as preconditioning which we have found to be beneficial for neural network training, as well as others of a more heuristic nature which are harder to justify, but which we have found to work well in practice. We will also provide practical tips for creating efficient and bug-free implementations and discuss various pitfalls which may arise when designing and using an HF-type approach in a particular application.

...read moreread less

Journal Article•DOI•

Model Predictive Control of Unknown Nonlinear Dynamical Systems Based on Recurrent Neural Networks

[...]

Yunpeng Pan¹, Jun Wang²•Institutions (2)

Georgia Institute of Technology¹, The Chinese University of Hong Kong²

01 Aug 2012-IEEE Transactions on Industrial Electronics

TL;DR: It is shown that the RNN-based nonlinear MPC scheme is effective and potentially suitable for real-time MPC implementation in many applications.

...read moreread less

Abstract: In this paper, we present a neurodynamic approach to model predictive control (MPC) of unknown nonlinear dynamical systems based on two recurrent neural networks (RNNs). The echo state network (ESN) and simplified dual network (SDN) are adopted for system identification and dynamic optimization, respectively. First, the unknown nonlinear system is identified based on the ESN with input-output training and testing samples. Then, the resulting nonconvex optimization problem associated with nonlinear MPC is decomposed via Taylor expansion. To estimate the higher order unknown term resulted from the decomposition, an online supervised learning algorithm is developed. Next, the SDN is applied for solving the relaxed convex optimization problem to compute the optimal control actions over the predicted horizon. Simulation results are provided to demonstrate the effectiveness and characteristics of the proposed approach. The proposed RNN-based approach has many desirable properties such as global convergence and low complexity. It is shown that the RNN-based nonlinear MPC scheme is effective and potentially suitable for real-time MPC implementation in many applications.

...read moreread less

Journal Article•DOI•

Cooperative coevolution of Elman recurrent neural networks for chaotic time series prediction

[...]

Rohitash Chandra¹, Mengjie Zhang¹•Institutions (1)

Victoria University of Wellington¹

01 Jun 2012-Neurocomputing

TL;DR: This work employs two problem decomposition methods for training Elman recurrent neural networks on chaotic time series problems and shows improvement in performance in terms of accuracy when compared to some of the methods from literature.

...read moreread less

Journal Article•DOI•

Forecasting wind speed with recurrent neural networks

[...]

Qing Cao¹, Bradley T. Ewing¹, Mark A. Thompson¹•Institutions (1)

Texas Tech University¹

16 Aug 2012-European Journal of Operational Research

TL;DR: This research presents a comparative analysis of the wind speed forecasting accuracy of univariate and multivariate ARIMA models with their recurrent neural network counterparts and indicates that multivariate models perform better than univariate models and that the recurrent Neural network models outperform the AR IMA models.

...read moreread less

Journal Article•DOI•

Dynamic behaviors of memristor-based recurrent neural networks with time-varying delays

[...]

Ailong Wu¹, Zhigang Zeng¹•Institutions (1)

Huazhong University of Science and Technology¹

01 Dec 2012-Neural Networks

TL;DR: A general class of memristor-based recurrent neural networks with time-varying delays with exponential convergence and conditions on the nondivergence and global attractivity are established by using local inhibition.

...read moreread less

Journal Article•DOI•

Exponential stability analysis of memristor-based recurrent neural networks with time-varying delays

[...]

Shiping Wen¹, Zhigang Zeng¹, Tingwen Huang²•Institutions (2)

Huazhong University of Science and Technology¹, Texas A&M University²

01 Nov 2012-Neurocomputing

TL;DR: These results ensure global exponential stability of memristor-based neural networks in the sense of Filippov solutions, and it is convenient to estimate the exponential convergence rates of this neural network by using the results.

...read moreread less

Journal Article•DOI•

A recurrent neural network for closed-loop intracortical brain-machine interface decoders.

[...]

David Sussillo¹, Paul Nuyujukian¹, Joline M. Fan¹, Jonathan C. Kao¹, Sergey D. Stavisky¹, Stephen I. Ryu¹, Stephen I. Ryu², Krishna V. Shenoy - Show less +4 more•Institutions (2)

Stanford University¹, Palo Alto Medical Foundation²

19 Mar 2012-Journal of Neural Engineering

TL;DR: The ability of a simplified type of RNN, one with limited modifications to the internal weights called an echostate network, to effectively and continuously decode monkey reaches during a standard center-out reach task using a cortical brain-machine interface (BMI) in a closed loop is explored.

...read moreread less

Abstract: Recurrent neural networks (RNNs) are useful tools for learning nonlinear relationships in time series data with complex temporal dependences. In this paper, we explore the ability of a simplified type of RNN, one with limited modifications to the internal weights called an echostate network (ESN), to effectively and continuously decode monkey reaches during a standard center-out reach task using a cortical brain–machine interface (BMI) in a closed loop. We demonstrate that the RNN, an ESN implementation termed a FORCE decoder (from first order reduced and controlled error learning), learns the task quickly and significantly outperforms the current state-of-the-art method, the velocity Kalman filter (VKF), using the measure of target acquire time. We also demonstrate that the FORCE decoder generalizes to a more difficult task by successfully operating the BMI in a randomized point-to-point task. The FORCE decoder is also robust as measured by the success rate over extended sessions. Finally, we show that decoded cursor dynamics are more like naturalistic hand movements than those of the VKF. Taken together, these results suggest that RNNs in general, and the FORCE decoder in particular, are powerful tools for BMI decoder applications.

...read moreread less

Journal Article•DOI•

Innovative Second-Generation Wavelets Construction With Recurrent Neural Networks for Solar Radiation Forecasting

[...]

Giacomo Capizzi¹, Christian Napoli¹, F. Bonanno¹•Institutions (1)

University of Catania¹

03 Oct 2012-IEEE Transactions on Neural Networks

TL;DR: A 2-day forecast is obtained by using novel wavelet recurrent neural networks (WRNNs) that performs the prediction in the wavelet domain and, in addition, also performs the inverse wavelet transform, giving the predicted signal as output.

...read moreread less

Abstract: Solar radiation prediction is an important challenge for the electrical engineer because it is used to estimate the power developed by commercial photovoltaic modules. This paper deals with the problem of solar radiation prediction based on observed meteorological data. A 2-day forecast is obtained by using novel wavelet recurrent neural networks (WRNNs). In fact, these WRNNS are used to exploit the correlation between solar radiation and timescale-related variations of wind speed, humidity, and temperature. The input to the selected WRNN is provided by timescale-related bands of wavelet coefficients obtained from meteorological time series. The experimental setup available at the University of Catania, Italy, provided this information. The novelty of this approach is that the proposed WRNN performs the prediction in the wavelet domain and, in addition, also performs the inverse wavelet transform, giving the predicted signal as output. The obtained simulation results show a very low root-mean-square error compared to the results of the solar radiation prediction approaches obtained by hybrid neural networks reported in the recent literature.

...read moreread less

Proceedings Article•DOI•

Revisiting Recurrent Neural Networks for robust ASR

[...]

Oriol Vinyals¹, Suman V. Ravuri², Daniel Povey¹•Institutions (2)

International Computer Science Institute¹, Microsoft²

25 Mar 2012

TL;DR: The Recurrent Neural Network is revisited, which explicitly models the Markovian dynamics of a set of observations through a non-linear function with a much larger hidden state space than traditional sequence models such as an HMM.

...read moreread less

Abstract: In this paper, we show how new training principles and optimization techniques for neural networks can be used for different network structures. In particular, we revisit the Recurrent Neural Network (RNN), which explicitly models the Markovian dynamics of a set of observations through a non-linear function with a much larger hidden state space than traditional sequence models such as an HMM. We apply pretraining principles used for Deep Neural Networks (DNNs) and second-order optimization techniques to train an RNN. Moreover, we explore its application in the Aurora2 speech recognition task under mismatched noise conditions using a Tandem approach. We observe top performance on clean speech, and under high noise conditions, compared to multi-layer perceptrons (MLPs) and DNNs, with the added benefit of being a “deeper” model than an MLP but more compact than a DNN.

...read moreread less

Journal Article•DOI•

Multistability of Neural Networks With Time-Varying Delays and Concave-Convex Characteristics

[...]

Zhigang Zeng¹, Wei Xing Zheng²•Institutions (2)

Huazhong University of Science and Technology¹, University of Western Sydney²

04 Jan 2012-IEEE Transactions on Neural Networks

TL;DR: Several succinct criteria are given to ascertain multistability of cellular neural networks and some sufficient conditions are obtained to ensure that an n-neuron neural network with concave-convex characteristics can have a fixed point located in the appointed region.

...read moreread less

Abstract: In this paper, stability of multiple equilibria of neural networks with time-varying delays and concave-convex characteristics is formulated and studied. Some sufficient conditions are obtained to ensure that an n-neuron neural network with concave-convex characteristics can have a fixed point located in the appointed region. By means of an appropriate partition of the n-dimensional state space, when nonlinear activation functions of an n-neuron neural network are concave or convex in 2k+2m-1 intervals, this neural network can have (2k+2m-1)n equilibrium points. This result can be applied to the multiobjective optimal control and associative memory. In particular, several succinct criteria are given to ascertain multistability of cellular neural networks. These stability conditions are the improvement and extension of the existing stability results in the literature. A numerical example is given to illustrate the theoretical findings via computer simulations.

...read moreread less

Book Chapter•DOI•

Connectionist Temporal Classification

[...]

Alex Graves¹•Institutions (1)

University of Toronto¹

01 Jan 2012

TL;DR: Experiments on speech and handwriting recognition show that a BLSTM network with a CTC output layer is an effective sequence labeller, generally outperforming standardHMMsandHMM-neural network hybrids, as well asmore recent sequence labelling algorithms such as large margin HMMs and conditional random fields.

...read moreread less

Abstract: This chapter introduces the connectionist temporal classification (CTC) output layer for recurrent neural networks (Graves et al., 2006). As its name suggests, CTC was specifically designed for temporal classification tasks; that is, for sequence labelling problems where the alignment between the inputs and the target labels is unknown. Unlike the hybrid approach described in the previous chapter, CTC models all aspects of the sequence with a single neural network, and does not require the network to be combined with a hidden Markov model. It also does not require presegmented training data, or external postprocessing to extract the label sequence from the network outputs. Experiments on speech and handwriting recognition show that a BLSTM network with a CTC output layer is an effective sequence labeller, generally outperforming standardHMMsandHMM-neural network hybrids, as well asmore recent sequence labelling algorithms such as large margin HMMs (Sha and Saul, 2006) and conditional random fields (Lafferty et al., 2001).

...read moreread less

Journal Article•DOI•

Optimal spike-based communication in excitable networks with strong-sparse and weak-dense links

[...]

Jun-nosuke Teramae¹, Jun-nosuke Teramae², Yasuhiro Tsubo¹, Tomoki Fukai¹•Institutions (2)

RIKEN Brain Science Institute¹, National Presto Industries²

02 Jul 2012-Scientific Reports

TL;DR: It is shown analytically and numerically, that recurrent neural networks can robustly generate internal noise optimal for spike transmission between neurons with the help of a long-tailed distribution in the weights of recurrent connections.

...read moreread less

Abstract: The connectivity of complex networks and functional implications has been attracting much interest in many physical, biological and social systems. However, the significance of the weight distributions of network links remains largely unknown except for uniformly- or Gaussian-weighted links. Here, we show analytically and numerically, that recurrent neural networks can robustly generate internal noise optimal for spike transmission between neurons with the help of a long-tailed distribution in the weights of recurrent connections. The structure of spontaneous activity in such networks involves weak-dense connections that redistribute excitatory activity over the network as noise sources to optimally enhance the responses of individual neurons to input at sparse-strong connections, thus opening multiple signal transmission pathways. Electrophysiological experiments confirm the importance of a highly broad connectivity spectrum supported by the model. Our results identify a simple network mechanism for internal noise generation by highly inhomogeneous connection strengths supporting both stability and optimal communication.

...read moreread less

Collapse