scispace - formally typeset
Search or ask a question

Showing papers on "Recurrent neural network published in 2012"


Posted Content
TL;DR: This paper proposes a gradient norm clipping strategy to deal with exploding gradients and a soft constraint for the vanishing gradients problem and validates empirically the hypothesis and proposed solutions.
Abstract: There are two widely known issues with properly training Recurrent Neural Networks, the vanishing and the exploding gradient problems detailed in Bengio et al. (1994). In this paper we attempt to improve the understanding of the underlying issues by exploring these problems from an analytical, a geometric and a dynamical systems perspective. Our analysis is used to justify a simple yet effective solution. We propose a gradient norm clipping strategy to deal with exploding gradients and a soft constraint for the vanishing gradients problem. We validate empirically our hypothesis and proposed solutions in the experimental section.

3,549 citations


Book
09 Feb 2012
TL;DR: A new type of output layer that allows recurrent networks to be trained directly for sequence labelling tasks where the alignment between the inputs and the labels is unknown, and an extension of the long short-term memory network architecture to multidimensional data, such as images and video sequences.
Abstract: Recurrent neural networks are powerful sequence learners. They are able to incorporate context information in a flexible way, and are robust to localised distortions of the input data. These properties make them well suited to sequence labelling, where input sequences are transcribed with streams of labels. The aim of this thesis is to advance the state-of-the-art in supervised sequence labelling with recurrent networks. Its two main contributions are (1) a new type of output layer that allows recurrent networks to be trained directly for sequence labelling tasks where the alignment between the inputs and the labels is unknown, and (2) an extension of the long short-term memory network architecture to multidimensional data, such as images and video sequences.

2,101 citations


Proceedings ArticleDOI
01 Jan 2012
TL;DR: This work analyzes the Long Short-Term Memory neural network architecture on an English and a large French language modeling task and gains considerable improvements in WER on top of a state-of-the-art speech recognition system.
Abstract: Neural networks have become increasingly popular for the task of language modeling. Whereas feed-forward networks only exploit a fixed context length to predict the next word of a sequence, conceptually, standard recurrent neural networks can take into account all of the predecessor words. On the other hand, it is well known that recurrent networks are difficult to train and therefore are unlikely to show the full potential of recurrent models. These problems are addressed by a the Long Short-Term Memory neural network architecture. In this work, we analyze this type of network on an English and a large French language modeling task. Experiments show improvements of about 8 % relative in perplexity over standard recurrent neural network LMs. In addition, we gain considerable improvements in WER on top of a state-of-the-art speech recognition system.

1,966 citations


Posted Content
TL;DR: This paper introduces an end-to-end, probabilistic sequence transduction system, based entirely on RNNs, that is in principle able to transform any input sequence into any finite, discrete output sequence.
Abstract: Many machine learning tasks can be expressed as the transformation---or \emph{transduction}---of input sequences into output sequences: speech recognition, machine translation, protein secondary structure prediction and text-to-speech to name but a few. One of the key challenges in sequence transduction is learning to represent both the input and output sequences in a way that is invariant to sequential distortions such as shrinking, stretching and translating. Recurrent neural networks (RNNs) are a powerful sequence learning architecture that has proven capable of learning such representations. However RNNs traditionally require a pre-defined alignment between the input and output sequences to perform transduction. This is a severe limitation since \emph{finding} the alignment is the most difficult aspect of many sequence transduction problems. Indeed, even determining the length of the output sequence is often challenging. This paper introduces an end-to-end, probabilistic sequence transduction system, based entirely on RNNs, that is in principle able to transform any input sequence into any finite, discrete output sequence. Experimental results for phoneme recognition are provided on the TIMIT speech corpus.

1,448 citations


Book ChapterDOI
01 Jan 2012
TL;DR: Practical techniques and recommendations for successfully applying Echo State Network, as well as some more advanced application-specific modifications are presented.
Abstract: Reservoir computing has emerged in the last decade as an alternative to gradient descent methods for training recurrent neural networks. Echo State Network (ESN) is one of the key reservoir computing “flavors”. While being practical, conceptually simple, and easy to implement, ESNs require some experience and insight to achieve the hailed good performance in many tasks. Here we present practical techniques and recommendations for successfully applying ESNs, as well as some more advanced application-specific modifications.

653 citations


Proceedings ArticleDOI
01 Dec 2012
TL;DR: This paper improves recurrent neural network language models performance by providing a contextual real-valued input vector in association with each word to convey contextual information about the sentence being modeled by performing Latent Dirichlet Allocation using a block of preceding text.
Abstract: Recurrent neural network language models (RNNLMs) have recently demonstrated state-of-the-art performance across a variety of tasks. In this paper, we improve their performance by providing a contextual real-valued input vector in association with each word. This vector is used to convey contextual information about the sentence being modeled. By performing Latent Dirichlet Allocation using a block of preceding text, we achieve a topic-conditioned RNNLM. This approach has the key advantage of avoiding the data fragmentation associated with building multiple topic models on different data subsets. We report perplexity results on the Penn Treebank data, where we achieve a new state-of-the-art. We further apply the model to the Wall Street Journal speech recognition task, where we observe improvements in word-error-rate.

644 citations


Posted Content
TL;DR: In this paper, a probabilistic model based on distribution estimators conditioned on a recurrent neural network is proposed to discover temporal dependencies in high-dimensional sequences of polyphonic music.
Abstract: We investigate the problem of modeling symbolic sequences of polyphonic music in a completely general piano-roll representation. We introduce a probabilistic model based on distribution estimators conditioned on a recurrent neural network that is able to discover temporal dependencies in high-dimensional sequences. Our approach outperforms many traditional models of polyphonic music on a variety of realistic datasets. We show how our musical language model can serve as a symbolic prior to improve the accuracy of polyphonic transcription.

615 citations


Book ChapterDOI
01 Jan 2012
TL;DR: This chapter discusses recurrent neural networks' ability to use contextual information when mapping between input and output sequences, and discusses the vanishing gradient problem, which affects this ability.
Abstract: As discussed in the previous chapter, an important benefit of recurrent neural networks is their ability to use contextual information when mapping between input and output sequences. Unfortunately, for standard RNN architectures, the range of context that can be in practice accessed is quite limited. The problem is that the influence of a given input on the hidden layer, and therefore on the network output, either decays or blows up exponentially as it cycles around the network’s recurrent connections. This effect is often referred to in the literature as the vanishing gradient problem (Hochreiter, 1991; Hochreiter et al., 2001a; Bengio et al., 1994). The vanishing gradient problem is illustrated schematically in Figure 4.1

529 citations


Posted Content
21 Nov 2012
TL;DR: The analysis is used to justify the simple yet effective solution of norm clipping the exploded gradient, and the comparison between this heuristic solution and standard SGD provides empirical evidence towards the hypothesis that such a heuristic is required to reach state of the art results on a character prediction task and a polyphonic music prediction one.
Abstract: Training Recurrent Neural Networks is more troublesome than feedforward ones because of the vanishing and exploding gradient problems detailed in Bengio et al. (1994). In this paper we attempt to understand the fundamental issues underlying the exploding gradient problem by exploring it from an analytical, a geometric and a dynamical system perspective. Our analysis is used to justify the simple yet effective solution of norm clipping the exploded gradient. In the experimental section, the comparison between this heuristic solution and standard SGD provides empirical evidence towards our hypothesis as well as it shows that such a heuristic is required to reach state of the art results on a character prediction task and a polyphonic music prediction one.

483 citations


Posted Content
TL;DR: Experiments reported here evaluate the use of clipping gradients, spanning longer time ranges with leaky integration, advanced momentum techniques, using more powerful output probability models, and encouraging sparser gradients to help symmetry breaking and credit assignment.
Abstract: After a more than decade-long period of relatively little research activity in the area of recurrent neural networks, several new developments will be reviewed here that have allowed substantial progress both in understanding and in technical solutions towards more efficient training of recurrent networks. These advances have been motivated by and related to the optimization issues surrounding deep learning. Although recurrent networks are extremely powerful in what they can in principle represent in terms of modelling sequences,their training is plagued by two aspects of the same issue regarding the learning of long-term dependencies. Experiments reported here evaluate the use of clipping gradients, spanning longer time ranges with leaky integration, advanced momentum techniques, using more powerful output probability models, and encouraging sparser gradients to help symmetry breaking and credit assignment. The experiments are performed on text and music data and show off the combined effects of these techniques in generally improving both training and test error.

394 citations


Journal ArticleDOI
TL;DR: Some sufficient conditions are obtained to guarantee the exponential synchronization of the coupled networks based on drive-response concept, differential inclusions theory and Lyapunov functional method.

Journal ArticleDOI
TL;DR: A brief introduction into basic concepts, methods, insights, current developments, and some applications of RC are given.
Abstract: Reservoir Computing (RC) is a paradigm of understanding and training Recurrent Neural Networks (RNNs) based on treating the recurrent part (the reservoir) differently than the readouts from it. It started ten years ago and is currently a prolific research area, giving important insights into RNNs, practical machine learning tools, as well as enabling computation with non-conventional hardware. Here we give a brief introduction into basic concepts, methods, insights, current developments, and highlight some applications of RC.

Proceedings Article
01 Jan 2012
TL;DR: This work introduces a model which uses a deep recurrent auto encoder neural network to denoise input features for robust ASR, and demonstrates the model is competitive with existing feature denoising approaches on the Aurora2 task, and outperforms a tandem approach where deep networks are used to predict phoneme posteriors directly.
Abstract: Recent work on deep neural networks as acoustic models for automatic speech recognition (ASR) have demonstrated substantial performance improvements. We introduce a model which uses a deep recurrent auto encoder neural network to denoise input features for robust ASR. The model is trained on stereo (noisy and clean) audio features to predict clean features given noisy input. The model makes no assumptions about how noise affects the signal, nor the existence of distinct noise environments. Instead, the model can learn to model any type of distortion or additive noise given sufficient training data. We demonstrate the model is competitive with existing feature denoising approaches on the Aurora2 task, and outperforms a tandem approach where deep networks are used to predict phoneme posteriors directly.

Journal ArticleDOI
TL;DR: Several sufficient conditions derived are presented to ascertain the existence of unique equilibrium, global asymptotic stability, and global exponential stability of delayedcomplex-valued recurrent neural networks with two classes of complex-valued activation functions.
Abstract: Since the last decade, several complex-valued neural networks have been developed and applied in various research areas. As an extension of real-valued recurrent neural networks, complex-valued recurrent neural networks use complex-valued states, connection weights, or activation functions with much more complicated properties than real-valued ones. This paper presents several sufficient conditions derived to ascertain the existence of unique equilibrium, global asymptotic stability, and global exponential stability of delayed complex-valued recurrent neural networks with two classes of complex-valued activation functions. Simulation results of three numerical examples are also delineated to substantiate the effectiveness of the theoretical results.

Journal ArticleDOI
TL;DR: The theory combines concepts from machine learning (reservoir computing), system modeling, stochastic processes, and functional analysis to define the computational capacity of a dynamical system.
Abstract: Many dynamical systems, both natural and artificial, are stimulated by time dependent external signals, somehow processing the information contained therein. We demonstrate how to quantify the different modes in which information can be processed by such systems and combine them to define the computational capacity of a dynamical system. This is bounded by the number of linearly independent state variables of the dynamical system, equaling it if the system obeys the fading memory condition. It can be interpreted as the total number of linearly independent functions of its stimuli the system can compute. Our theory combines concepts from machine learning (reservoir computing), system modeling, stochastic processes, and functional analysis. We illustrate our theory by numerical simulations for the logistic map, a recurrent neural network, and a two-dimensional reaction diffusion system, uncovering universal trade-offs between the non-linearity of the computation and the system's short-term memory.

Journal ArticleDOI
TL;DR: A robust recurrent neural network is presented in a Bayesian framework based on echo state mechanisms that is robust in the presence of outliers and is superior to existing methods.
Abstract: In this paper, a robust recurrent neural network is presented in a Bayesian framework based on echo state mechanisms. Since the new model is capable of handling outliers in the training data set, it is termed as a robust echo state network (RESN). The RESN inherits the basic idea of ESN learning in a Bayesian framework, but replaces the commonly used Gaussian distribution with a Laplace one, which is more robust to outliers, as the likelihood function of the model output. Moreover, the training of the RESN is facilitated by employing a bound optimization algorithm, based on which, a proper surrogate function is derived and the Laplace likelihood function is approximated by a Gaussian one, while remaining robust to outliers. It leads to an efficient method for estimating model parameters, which can be solved by using a Bayesian evidence procedure in a fully autonomous way. Experimental results show that the proposed method is robust in the presence of outliers and is superior to existing methods.

Journal ArticleDOI
TL;DR: A novel keyword spotting method for handwritten documents is described, derived from a neural network-based system for unconstrained handwriting recognition, that performs template-free spotting, i.e., it is not necessary for a keyword to appear in the training set.
Abstract: Keyword spotting refers to the process of retrieving all instances of a given keyword from a document. In the present paper, a novel keyword spotting method for handwritten documents is described. It is derived from a neural network-based system for unconstrained handwriting recognition. As such it performs template-free spotting, i.e., it is not necessary for a keyword to appear in the training set. The keyword spotting is done using a modification of the CTC Token Passing algorithm in conjunction with a recurrent neural network. We demonstrate that the proposed systems outperform not only a classical dynamic time warping-based approach but also a modern keyword spotting system, based on hidden Markov models. Furthermore, we analyze the performance of the underlying neural networks when using them in a recognition task followed by keyword spotting on the produced transcription. We point out the advantages of keyword spotting when compared to classic text line recognition.

Journal ArticleDOI
TL;DR: Evidence is presented that both information transfer and storage in the recurrent layer are maximized close to this phase transition, providing an explanation for why guiding the recurrent layers toward the edge of chaos is computationally useful.

Book ChapterDOI
01 Jan 2012
TL;DR: This chapter describes the basic HF approach, and examines well-known performance-improving techniques such as preconditioning which have been beneficial for neural network training and others of a more heuristic nature which are harder to justify, but which have found to work well in practice.
Abstract: In this chapter we will first describe the basic HF approach, and then examine well-known performance-improving techniques such as preconditioning which we have found to be beneficial for neural network training, as well as others of a more heuristic nature which are harder to justify, but which we have found to work well in practice. We will also provide practical tips for creating efficient and bug-free implementations and discuss various pitfalls which may arise when designing and using an HF-type approach in a particular application.

Journal ArticleDOI
TL;DR: It is shown that the RNN-based nonlinear MPC scheme is effective and potentially suitable for real-time MPC implementation in many applications.
Abstract: In this paper, we present a neurodynamic approach to model predictive control (MPC) of unknown nonlinear dynamical systems based on two recurrent neural networks (RNNs). The echo state network (ESN) and simplified dual network (SDN) are adopted for system identification and dynamic optimization, respectively. First, the unknown nonlinear system is identified based on the ESN with input-output training and testing samples. Then, the resulting nonconvex optimization problem associated with nonlinear MPC is decomposed via Taylor expansion. To estimate the higher order unknown term resulted from the decomposition, an online supervised learning algorithm is developed. Next, the SDN is applied for solving the relaxed convex optimization problem to compute the optimal control actions over the predicted horizon. Simulation results are provided to demonstrate the effectiveness and characteristics of the proposed approach. The proposed RNN-based approach has many desirable properties such as global convergence and low complexity. It is shown that the RNN-based nonlinear MPC scheme is effective and potentially suitable for real-time MPC implementation in many applications.

Journal ArticleDOI
TL;DR: This work employs two problem decomposition methods for training Elman recurrent neural networks on chaotic time series problems and shows improvement in performance in terms of accuracy when compared to some of the methods from literature.

Journal ArticleDOI
TL;DR: This research presents a comparative analysis of the wind speed forecasting accuracy of univariate and multivariate ARIMA models with their recurrent neural network counterparts and indicates that multivariate models perform better than univariate models and that the recurrent Neural network models outperform the AR IMA models.

Journal ArticleDOI
TL;DR: A general class of memristor-based recurrent neural networks with time-varying delays with exponential convergence and conditions on the nondivergence and global attractivity are established by using local inhibition.

Journal ArticleDOI
TL;DR: These results ensure global exponential stability of memristor-based neural networks in the sense of Filippov solutions, and it is convenient to estimate the exponential convergence rates of this neural network by using the results.

Journal ArticleDOI
TL;DR: The ability of a simplified type of RNN, one with limited modifications to the internal weights called an echostate network, to effectively and continuously decode monkey reaches during a standard center-out reach task using a cortical brain-machine interface (BMI) in a closed loop is explored.
Abstract: Recurrent neural networks (RNNs) are useful tools for learning nonlinear relationships in time series data with complex temporal dependences. In this paper, we explore the ability of a simplified type of RNN, one with limited modifications to the internal weights called an echostate network (ESN), to effectively and continuously decode monkey reaches during a standard center-out reach task using a cortical brain–machine interface (BMI) in a closed loop. We demonstrate that the RNN, an ESN implementation termed a FORCE decoder (from first order reduced and controlled error learning), learns the task quickly and significantly outperforms the current state-of-the-art method, the velocity Kalman filter (VKF), using the measure of target acquire time. We also demonstrate that the FORCE decoder generalizes to a more difficult task by successfully operating the BMI in a randomized point-to-point task. The FORCE decoder is also robust as measured by the success rate over extended sessions. Finally, we show that decoded cursor dynamics are more like naturalistic hand movements than those of the VKF. Taken together, these results suggest that RNNs in general, and the FORCE decoder in particular, are powerful tools for BMI decoder applications.

Journal ArticleDOI
TL;DR: A 2-day forecast is obtained by using novel wavelet recurrent neural networks (WRNNs) that performs the prediction in the wavelet domain and, in addition, also performs the inverse wavelet transform, giving the predicted signal as output.
Abstract: Solar radiation prediction is an important challenge for the electrical engineer because it is used to estimate the power developed by commercial photovoltaic modules. This paper deals with the problem of solar radiation prediction based on observed meteorological data. A 2-day forecast is obtained by using novel wavelet recurrent neural networks (WRNNs). In fact, these WRNNS are used to exploit the correlation between solar radiation and timescale-related variations of wind speed, humidity, and temperature. The input to the selected WRNN is provided by timescale-related bands of wavelet coefficients obtained from meteorological time series. The experimental setup available at the University of Catania, Italy, provided this information. The novelty of this approach is that the proposed WRNN performs the prediction in the wavelet domain and, in addition, also performs the inverse wavelet transform, giving the predicted signal as output. The obtained simulation results show a very low root-mean-square error compared to the results of the solar radiation prediction approaches obtained by hybrid neural networks reported in the recent literature.

Proceedings ArticleDOI
25 Mar 2012
TL;DR: The Recurrent Neural Network is revisited, which explicitly models the Markovian dynamics of a set of observations through a non-linear function with a much larger hidden state space than traditional sequence models such as an HMM.
Abstract: In this paper, we show how new training principles and optimization techniques for neural networks can be used for different network structures. In particular, we revisit the Recurrent Neural Network (RNN), which explicitly models the Markovian dynamics of a set of observations through a non-linear function with a much larger hidden state space than traditional sequence models such as an HMM. We apply pretraining principles used for Deep Neural Networks (DNNs) and second-order optimization techniques to train an RNN. Moreover, we explore its application in the Aurora2 speech recognition task under mismatched noise conditions using a Tandem approach. We observe top performance on clean speech, and under high noise conditions, compared to multi-layer perceptrons (MLPs) and DNNs, with the added benefit of being a “deeper” model than an MLP but more compact than a DNN.

Journal ArticleDOI
TL;DR: Several succinct criteria are given to ascertain multistability of cellular neural networks and some sufficient conditions are obtained to ensure that an n-neuron neural network with concave-convex characteristics can have a fixed point located in the appointed region.
Abstract: In this paper, stability of multiple equilibria of neural networks with time-varying delays and concave-convex characteristics is formulated and studied. Some sufficient conditions are obtained to ensure that an n-neuron neural network with concave-convex characteristics can have a fixed point located in the appointed region. By means of an appropriate partition of the n-dimensional state space, when nonlinear activation functions of an n-neuron neural network are concave or convex in 2k+2m-1 intervals, this neural network can have (2k+2m-1)n equilibrium points. This result can be applied to the multiobjective optimal control and associative memory. In particular, several succinct criteria are given to ascertain multistability of cellular neural networks. These stability conditions are the improvement and extension of the existing stability results in the literature. A numerical example is given to illustrate the theoretical findings via computer simulations.

Book ChapterDOI
01 Jan 2012
TL;DR: Experiments on speech and handwriting recognition show that a BLSTM network with a CTC output layer is an effective sequence labeller, generally outperforming standardHMMsandHMM-neural network hybrids, as well asmore recent sequence labelling algorithms such as large margin HMMs and conditional random fields.
Abstract: This chapter introduces the connectionist temporal classification (CTC) output layer for recurrent neural networks (Graves et al., 2006). As its name suggests, CTC was specifically designed for temporal classification tasks; that is, for sequence labelling problems where the alignment between the inputs and the target labels is unknown. Unlike the hybrid approach described in the previous chapter, CTC models all aspects of the sequence with a single neural network, and does not require the network to be combined with a hidden Markov model. It also does not require presegmented training data, or external postprocessing to extract the label sequence from the network outputs. Experiments on speech and handwriting recognition show that a BLSTM network with a CTC output layer is an effective sequence labeller, generally outperforming standardHMMsandHMM-neural network hybrids, as well asmore recent sequence labelling algorithms such as large margin HMMs (Sha and Saul, 2006) and conditional random fields (Lafferty et al., 2001).

Journal ArticleDOI
TL;DR: It is shown analytically and numerically, that recurrent neural networks can robustly generate internal noise optimal for spike transmission between neurons with the help of a long-tailed distribution in the weights of recurrent connections.
Abstract: The connectivity of complex networks and functional implications has been attracting much interest in many physical, biological and social systems. However, the significance of the weight distributions of network links remains largely unknown except for uniformly- or Gaussian-weighted links. Here, we show analytically and numerically, that recurrent neural networks can robustly generate internal noise optimal for spike transmission between neurons with the help of a long-tailed distribution in the weights of recurrent connections. The structure of spontaneous activity in such networks involves weak-dense connections that redistribute excitatory activity over the network as noise sources to optimally enhance the responses of individual neurons to input at sparse-strong connections, thus opening multiple signal transmission pathways. Electrophysiological experiments confirm the importance of a highly broad connectivity spectrum supported by the model. Our results identify a simple network mechanism for internal noise generation by highly inhomogeneous connection strengths supporting both stability and optimal communication.