scispace - formally typeset
Search or ask a question

Showing papers on "Activation function published in 2014"


Proceedings Article
04 Mar 2014
TL;DR: In this paper, a Network in Network (NIN) architecture is proposed to enhance model discriminability for local patches within the receptive field, where the feature maps are obtained by sliding the micro networks over the input in a similar manner as CNN, and then fed into the next layer.
Abstract: We propose a novel deep network structure called "Network In Network" (NIN) to enhance model discriminability for local patches within the receptive field. The conventional convolutional layer uses linear filters followed by a nonlinear activation function to scan the input. Instead, we build micro neural networks with more complex structures to abstract the data within the receptive field. We instantiate the micro neural network with a multilayer perceptron, which is a potent function approximator. The feature maps are obtained by sliding the micro networks over the input in a similar manner as CNN; they are then fed into the next layer. Deep NIN can be implemented by stacking mutiple of the above described structure. With enhanced local modeling via the micro network, we are able to utilize global average pooling over feature maps in the classification layer, which is easier to interpret and less prone to overfitting than traditional fully connected layers. We demonstrated the state-of-the-art classification performances with NIN on CIFAR-10 and CIFAR-100, and reasonable performances on SVHN and MNIST datasets.

1,973 citations


Proceedings Article
01 Jan 2014
TL;DR: It is found that it is always best to train using the dropout algorithm--the drop out algorithm is consistently best at adapting to the new task, remembering the old task, and has the best tradeoff curve between these two extremes.
Abstract: Catastrophic forgetting is a problem faced by many machine learning models and algorithms. When trained on one task, then trained on a second task, many machine learning models "forget" how to perform the first task. This is widely believed to be a serious problem for neural networks. Here, we investigate the extent to which the catastrophic forgetting problem occurs for modern neural networks, comparing both established and recent gradient-based training algorithms and activation functions. We also examine the effect of the relationship between the first task and the second task on catastrophic forgetting. We find that it is always best to train using the dropout algorithm--the dropout algorithm is consistently best at adapting to the new task, remembering the old task, and has the best tradeoff curve between these two extremes. We find that different tasks and relationships between tasks result in very different rankings of activation function performance. This suggests the choice of activation function should always be cross-validated.

507 citations


Journal ArticleDOI
TL;DR: A new measure based on topological concepts is introduced, aimed at evaluating the complexity of the function implemented by a neural network, used for classification purposes, and results seem to support the idea that deep networks actually implements functions of higher complexity, so that they are able, with the same number of resources, to address more difficult problems.
Abstract: Recently, researchers in the artificial neural network field have focused their attention on connectionist models composed by several hidden layers. In fact, experimental results and heuristic considerations suggest that deep architectures are more suitable than shallow ones for modern applications, facing very complex problems, e.g., vision and human language understanding. However, the actual theoretical results supporting such a claim are still few and incomplete. In this paper, we propose a new approach to study how the depth of feedforward neural networks impacts on their ability in implementing high complexity functions. First, a new measure based on topological concepts is introduced, aimed at evaluating the complexity of the function implemented by a neural network, used for classification purposes. Then, deep and shallow neural architectures with common sigmoidal activation functions are compared, by deriving upper and lower bounds on their complexity, and studying how the complexity depends on the number of hidden units and the used activation function. The obtained results seem to support the idea that deep networks actually implements functions of higher complexity, so that they are able, with the same number of resources, to address more difficult problems.

455 citations


Posted Content
TL;DR: A novel form of piecewise linear activation function that is learned independently for each neuron using gradient descent is designed, achieving state-of-the-art performance on CIFar-10, CIFAR-100, and a benchmark from high-energy physics involving Higgs boson decay modes.
Abstract: Artificial neural networks typically have a fixed, non-linear activation function at each neuron. We have designed a novel form of piecewise linear activation function that is learned independently for each neuron using gradient descent. With this adaptive activation function, we are able to improve upon deep neural network architectures composed of static rectified linear units, achieving state-of-the-art performance on CIFAR-10 (7.51%), CIFAR-100 (30.83%), and a benchmark from high-energy physics involving Higgs boson decay modes.

404 citations


Journal ArticleDOI
TL;DR: The global convergence of the neural network is proven with the proposed nonlinear complex-valued activation functions and a special type of activation function with a core function, called sign-bi-power function, is proven to enable the ZNN to converge in finite time, which further enhances its advantage in online processing.
Abstract: The Sylvester equation is often encountered in mathematics and control theory. For the general time-invariant Sylvester equation problem, which is defined in the domain of complex numbers, the Bartels-Stewart algorithm and its extensions are effective and widely used with an O(n³) time complexity. When applied to solving the time-varying Sylvester equation, the computation burden increases intensively with the decrease of sampling period and cannot satisfy continuous realtime calculation requirements. For the special case of the general Sylvester equation problem defined in the domain of real numbers, gradient-based recurrent neural networks are able to solve the time-varying Sylvester equation in real time, but there always exists an estimation error while a recently proposed recurrent neural network by Zhang et al [this type of neural network is called Zhang neural network (ZNN)] converges to the solution ideally. The advancements in complex-valued neural networks cast light to extend the existing real-valued ZNN for solving the time-varying real-valued Sylvester equation to its counterpart in the domain of complex numbers. In this paper, a complex-valued ZNN for solving the complex-valued Sylvester equation problem is investigated and the global convergence of the neural network is proven with the proposed nonlinear complex-valued activation functions. Moreover, a special type of activation function with a core function, called sign-bi-power function, is proven to enable the ZNN to converge in finite time, which further enhances its advantage in online processing. In this case, the upper bound of the convergence time is also derived analytically. Simulations are performed to evaluate and compare the performance of the neural network with different parameters and activation functions. Both theoretical analysis and numerical simulations validate the effectiveness of the proposed method.

192 citations


Journal ArticleDOI
TL;DR: The concept of the extended dissipativity can be used to solve for the H∞, L2 - L2, passive, and dissipative performance by adjusting the weighting matrices in a new performance index.
Abstract: In this brief, an extended dissipativity analysis was conducted for a neural network with time-varying delays. The concept of the extended dissipativity can be used to solve for the H∞, L2-L∞, passive, and dissipative performance by adjusting the weighting matrices in a new performance index. In addition, the activation function dividing method is modified by introducing a tuning parameter. Examples are provided to show the effectiveness and less conservatism of the proposed method.

170 citations


Book ChapterDOI
15 Sep 2014
TL;DR: It is shown that multilayer perceptrons (MLP) consisting of the L p units achieve the state-of-the-art results on a number of benchmark datasets and the proposed L p unit is evaluated on the recently proposed deep recurrent neural networks (RNN).
Abstract: In this paper we propose and investigate a novel nonlinear unit, called L p unit, for deep neural networks. The proposed L p unit receives signals from several projections of a subset of units in the layer below and computes a normalized L p norm. We notice two interesting interpretations of the L p unit. First, the proposed unit can be understood as a generalization of a number of conventional pooling operators such as average, root-mean-square and max pooling widely used in, for instance, convolutional neural networks (CNN), HMAX models and neocognitrons. Furthermore, the L p unit is, to a certain degree, similar to the recently proposed maxout unit [13] which achieved the state-of-the-art object recognition results on a number of benchmark datasets. Secondly, we provide a geometrical interpretation of the activation function based on which we argue that the L p unit is more efficient at representing complex, nonlinear separating boundaries. Each L p unit defines a superelliptic boundary, with its exact shape defined by the order p. We claim that this makes it possible to model arbitrarily shaped, curved boundaries more efficiently by combining a few L p units of different orders. This insight justifies the need for learning different orders for each unit in the model. We empirically evaluate the proposed L p units on a number of datasets and show that multilayer perceptrons (MLP) consisting of the L p units achieve the state-of-the-art results on a number of benchmark datasets. Furthermore, we evaluate the proposed L p unit on the recently proposed deep recurrent neural networks (RNN).

160 citations


Journal ArticleDOI
TL;DR: An efficient approximation scheme for hyperbolic tangent function is proposed, based on a mathematical analysis considering the maximum allowable error as design parameter, which results in reduction in area, delay, and power in VLSI implementation of artificial neural networks with hyperbolics tangent activation function.
Abstract: Nonlinear activation function is one of the main building blocks of artificial neural networks. Hyperbolic tangent and sigmoid are the most used nonlinear activation functions. Accurate implementation of these transfer functions in digital networks faces certain challenges. In this paper, an efficient approximation scheme for hyperbolic tangent function is proposed. The approximation is based on a mathematical analysis considering the maximum allowable error as design parameter. Hardware implementation of the proposed approximation scheme is presented, which shows that the proposed structure compares favorably with previous architectures in terms of area and delay. The proposed structure requires less output bits for the same maximum allowable error when compared to the state-of-the-art. The number of output bits of the activation function determines the bit width of multipliers and adders in the network. Therefore, the proposed activation function results in reduction in area, delay, and power in VLSI implementation of artificial neural networks with hyperbolic tangent activation function.

131 citations


Journal ArticleDOI
Tao Fang1, Jitao Sun1
TL;DR: In this paper, a new condition for the complex-valued activation function is presented, which is less conservative than the Lipschitz condition that is widely assumed in the literature, based on the new condition and linear matrix inequality, some new criteria to ensure the existence, uniqueness, and globally asymptotic stability of the equilibrium point of RNNs with time delays are established.
Abstract: This brief points out two mistakes in a recently published paper on complex-valued recurrent neural networks (RNNs). Moreover, a new condition for the complex-valued activation function is presented, which is less conservative than the Lipschitz condition that is widely assumed in the literature. Based on the new condition and linear matrix inequality, some new criteria to ensure the existence, uniqueness, and globally asymptotical stability of the equilibrium point of complex-valued RNNs with time delays are established. A numerical example is given to illustrate the effectiveness of the theoretical results.

116 citations


Journal ArticleDOI
TL;DR: It is shown that a two hidden layer neural network with d inputs, d neurons in the first hidden layer, 2 d + 2 neurons inThe second hidden layer and with a specifically constructed sigmoidal and infinitely differentiable activation function can approximate any continuous multivariate function with arbitrary accuracy.

85 citations


Proceedings Article
21 Dec 2014
TL;DR: In this paper, a piecewise linear activation function is learned independently for each neuron using gradient descent, achieving state-of-the-art performance on CIFAR-10 (7.51%), CFIAR-100 (30.83%), and a benchmark from high-energy physics involving Higgs boson decay modes.
Abstract: Artificial neural networks typically have a fixed, non-linear activation function at each neuron. We have designed a novel form of piecewise linear activation function that is learned independently for each neuron using gradient descent. With this adaptive activation function, we are able to improve upon deep neural network architectures composed of static rectified linear units, achieving state-of-the-art performance on CIFAR-10 (7.51%), CIFAR-100 (30.83%), and a benchmark from high-energy physics involving Higgs boson decay modes.

Journal ArticleDOI
TL;DR: A newly augmented Lyapunov-Krasovskii functional is introduced and sufficient condition of state estimation for neural networks with time-varying delays is presented and proved by using convex polyhedron method and novel activation function conditions.

Journal ArticleDOI
TL;DR: It is theoretically proved that such two ZNN models globally and exponentially converge to the theoretical solution of time-varying linear matrix equations when using linear activation functions, and the upper bound of the convergence time is derived analytically via Lyapunov theory.
Abstract: In addition to the parallel-distributed nature, recurrent neural networks can be implemented physically by designated hardware and thus have been found broad applications in many fields. In this paper, a special class of recurrent neural network named Zhang neural network (ZNN), together with its electronic realization, is investigated and exploited for online solution of time-varying linear matrix equations. By following the idea of Zhang function (i.e., error function), two ZNN models are proposed and studied, which allow us to choose plentiful activation functions (e.g., any monotonically-increasing odd activation function). It is theoretically proved that such two ZNN models globally and exponentially converge to the theoretical solution of time-varying linear matrix equations when using linear activation functions. Besides, the new activation function, named Li activation function, is exploited. It is theoretically proved that, when using Li activation function, such two ZNN models can be further accelerated to finite-time convergence to the time-varying theoretical solution. In addition, the upper bound of the convergence time is derived analytically via Lyapunov theory. Then, we conduct extensive simulations using such two ZNN models. The results substantiate the theoretical analysis and the efficacy of the proposed ZNN models for solving time-varying linear matrix equations.

Journal ArticleDOI
TL;DR: The problem of robust exponential stability analysis for a class of Takagi–Sugeno (TS) fuzzy Cohen–Grossberg neural networks with uncertainties and time-varying delays is investigated and a generalized activation function is used.

Journal ArticleDOI
TL;DR: A newly augmented stochastic Lyapunov–Krasovskii functional and novel activation function conditions are introduced, and sufficient condition for Markovian jumping neural networks is presented, and the state trajectory remains in a bounded region over a pre-specified finite-time interval.

Journal ArticleDOI
TL;DR: Finite time dual neural networks with a new activation function that has two tunable parameters are presented to solve quadratic programming problems and are applied to estimate parameters for an energy model of belt conveyors.

Journal ArticleDOI
TL;DR: This paper provides some initial theoretical evidence of when and how depth can be extremely effective and shows that functions that exhibit repeating patterns can be encoded much more efficiently in the deep representation, resulting in significant reduction in complexity.
Abstract: We present a comparative theoretical analysis of representation in artificial neural networks with two extreme architectures, a shallow wide network and a deep narrow network, devised to maximally decouple their representative power due to layer width and network depth. We show that, given a specific activation function, models with comparable VC-dimension are required to guarantee zero error modeling of real functions over a binary input. However, functions that exhibit repeating patterns can be encoded much more efficiently in the deep representation, resulting in significant reduction in complexity. This paper provides some initial theoretical evidence of when and how depth can be extremely effective.

Proceedings ArticleDOI
14 Sep 2014
TL;DR: Phone recognition tests on the TIMIT database show that switching to maxout units from rectifier units decreases the phone error rate for each network configuration studied, and yields relative error rate reductions of between 2% and 6%.
Abstract: Convolutional neural networks have recently been shown to outperform fully connected deep neural networks on several speech recognition tasks. Their superior performance is due to their convolutional structure that processes several, slightly shifted versions of the input window using the same weights, and then pools the resulting neural activations. This pooling operation makes the network less sensitive to translations. The convolutional network results published up till now used sigmoid or rectified linear neurons. However, quite recently a new type of activation function called the maxout activation has been proposed. Its operation is closely related to convolutional networks, as it applies a similar pooling step, but over different neurons evaluated on the same input. Here, we combine the two technologies, and experiment with deep convolutional neural networks built from maxout neurons. Phone recognition tests on the TIMIT database show that switching to maxout units from rectifier units decreases the phone error rate for each network configuration studied, and yields relative error rate reductions of between 2% and 6%.

Posted Content
TL;DR: This work considers neural networks with a single hidden layer and non-decreasing homogeneous activa-tion functions like the rectified linear units and shows that they are adaptive to unknown underlying linear structures, such as the dependence on the projection of the input variables onto a low-dimensional subspace.
Abstract: We consider neural networks with a single hidden layer and non-decreasing homogeneous activa-tion functions like the rectified linear units. By letting the number of hidden units grow unbounded and using classical non-Euclidean regularization tools on the output weights, we provide a detailed theoretical analysis of their generalization performance, with a study of both the approximation and the estimation errors. We show in particular that they are adaptive to unknown underlying linear structures, such as the dependence on the projection of the input variables onto a low-dimensional subspace. Moreover, when using sparsity-inducing norms on the input weights, we show that high-dimensional non-linear variable selection may be achieved, without any strong assumption regarding the data and with a total number of variables potentially exponential in the number of ob-servations. In addition, we provide a simple geometric interpretation to the non-convex problem of addition of a new unit, which is the core potentially hard computational element in the framework of learning from continuously many basis functions. We provide simple conditions for convex relaxations to achieve the same generalization error bounds, even when constant-factor approxi-mations cannot be found (e.g., because it is NP-hard such as for the zero-homogeneous activation function). We were not able to find strong enough convex relaxations and leave open the existence or non-existence of polynomial-time algorithms.

Journal ArticleDOI
TL;DR: Because the wavelet frame is highly redundant, the time–frequency localization and matching pursuit algorithm are respectively utilized to eliminate the superfluous wavelets, thus the obtained waveletframe neural network can be implemented efficiently.
Abstract: Artificial neural networks (ANNs) method is widely used in reliability analysis. However, the performance of ANNs cannot be guaranteed due to the fitting problems because there is no efficient constructive method for choosing the structure and the learning parameters of the network. To mitigate these difficulties, this article presents a new adaptive wavelet frame neural network method for reliability analysis of structures. The new method uses the single-scaling multidimensional wavelet frame as the activation function in the network to deal with the multidimensional problems in reliability analysis. Because the wavelet frame is highly redundant, the time–frequency localization and matching pursuit algorithm are respectively utilized to eliminate the superfluous wavelets, thus the obtained wavelet frame neural network can be implemented efficiently. Five examples are given to demonstrate the application and effectiveness of the proposed method. Comparisons of the new method and the classical radial basis function network method are made.

Journal ArticleDOI
TL;DR: Artificial neural networks (ANNs) are one of the most powerful and popular tools for black-box modeling and are designed and inspired by real biological neural networks and their effects.
Abstract: Modeling real-world systems plays a pivotal role in their analysis and contributes to a better understanding of their behavior and performance. Classification, optimization, control, and pattern recognition problems rely heavily on modeling techniques. Such models can be categorized into three classes: white-box, black-box, and gray-box (Nelles, 2001). White-box models are fully derived from first principles, i.e., physical, chemical, biological, economical, etc. laws. All equations and parameters are determined from theory. Black-box models are based solely on experimental data, and their structure and parameters are determined by experimental modeling. Building black-box models requires little or no prior knowledge of the system. Gray-box models represent a compromise or combination of white-box and black-box models (Nelles, 2001). In the modeling of highly nonlinear and complex phenomena, we may not have a good understanding of the processes, and thus black-box models may be our best (or even our only) choice. Artificial neural networks (ANNs) are one of the most powerful and popular tools for black-box modeling and are designed and inspired by real biological neural networks. There has been an increasing interest in analyzing neurophysiology from a nonlinear and chaotic systems viewpoint in recent years (Christini and Collins, 1995; Sarbadhikari and Chakrabarty, 2001; Korn and Faure, 2003; Hadaeghi et al., 2013; Jafari et al., 2013; Mattei, 2013). For example, although the famous Hodgkin and Huxley model (Hodgkin and Huxley, 1952) has been the basis of almost all of the proposed models for neural firing, the Rose-Hindmarsh model (Hindmarsh and Rose, 1984) is known to be a more refined model since it can show different firing patterns, especially chaotic bursts of action potential, which enable a proper matching between this model behavior and experimental data. Another example of the observation of chaotic behavior in the nervous system is the period-doubling route to chaos in flicker vision (Crevier and Meister, 1998), which is the focus of this letter. Stimulation with periodic flashes of light is useful for distinguishing some disorders of the human visual system (Crevier and Meister, 1998). It has been shown by Crevier and Meister (1998) that during electroretinogram (ERG) recordings of the visual system, period-doubling can occur. It is well-known that period-doubling occurs in nonlinear dynamical systems, and it is often associated with the onset of chaos. In one study (Crevier and Meister, 1998) the retina of a salamander was stimulated with a periodic square-wave flashes, and the ERG was recorded. The flash frequency was changed between zero and 30 Hz, while the contrast was constant. In another record, the contrast was changed while the frequency was fixed at 16 Hz. All the ERG signals were filtered at 1–1000 Hz. Using a common approach to obtain a discrete time series from a continuous recorded signal, successive local maxima of the signal were extracted as a time series (Figure ​(Figure1A).1A). As shown in Figures 1B,C, both the parameters (flash frequency and contrast) have a great effect on the recorded ERG signals and cause bifurcations resulting in a period-doubling route to chaos. Figure 1 (A) One example of the local maxima of the ERG signals. (B) Real bifurcation diagram resulted from varying flash frequency. (C) Real bifurcation diagram resulted from varying contrast. (D) The structure of the ANNs that were used. (E) Artificial bifurcation ... However, it is difficult to understand the exact relations between the parameters and their effects. In other words, it is not easy to build a white-box model that can regenerate the signals and diagrams accurately. That may be because of the highly complex and nonlinear dynamics involved. We have used the ability of an ANN in learning highly nonlinear dynamics as a black-box model of this system. We used a four hidden layer feed-forward neural network with (7/4/8/5) neurons in the layers (Figure ​(Figure1D)1D) and nonlinear transfer functions hyperbolic tangent function that help the network learn the complex relationships between input and output. The activation function of the last layer of the network is linear transfer function. We used two parameters (contrast and frequency) and three time delays (xn−1, xn−2, and xn−3) as the inputs of the ANN to fit each data point of the time series (xn) as the output of the network. As shown in Figures 1E,F, this model can generate bifurcation diagrams similar to those obtained from real data. As the result, we believe that ANNs are powerful tools for modeling highly nonlinear behavior in the nervous system. We plan to construct ANN models in future work including extension to more cases and details, extension of the ideas in Hadaeghi et al. (2013) to patients with bipolar disorder, and extension of the ideas in Jafari et al. (2013) to patients with attention deficit hyperactivity disorder (ADHD).

Journal ArticleDOI
TL;DR: A connectionist model which exploits adaptive activation functions, implicitly a co-trained coupled model and, in turn, a flexible, non-standard neural architecture is presented.

Journal ArticleDOI
TL;DR: The usage of the proposed algorithms leads to better generalisation results, similar to the results for the ten function approximation tasks wherein the networks were trained using the resilient backpropagation algorithm.

Proceedings ArticleDOI
01 Dec 2014
TL;DR: A detailed analysis of the FPGA implementations of the Sigmoid and Exponential functions is carried out, in a approach combining a lookup table with a linear interpolation procedure, showing a clear improvement in relationship to previously published works.
Abstract: The efficient implementation of artificial neural networks in FPGA boards requires tackling several issues that strongly affect the final result. One of these issues is the computation of the neuron's activation function. In this work, a detailed analysis of the FPGA implementations of the Sigmoid and Exponential functions is carried out, in a approach combining a lookup table with a linear interpolation procedure. Further, to optimize board resources utilization, a time division multiplexing of the multiplier attached to the neurons was used. The results are evaluated in terms of the absolute and relative errors obtained and also through measuring a quality factor and the resource utilization, showing a clear improvement in relationship to previously published works.

Journal ArticleDOI
TL;DR: The hardware implementation of an evolvable block-based neural network that utilizes a novel and cost efficient sigmoid-like activation function is presented, and the neuron blocks are very cost efficient in terms of logic utilization when compared to the previous work.

Journal ArticleDOI
TL;DR: A Hill-Climbing Neural Network (HillClimbNet) is proposed that controls the movement of the Ms. Pac-man agent to travel around the maze, gobble all of the pills and escape from the ghosts in the maze.
Abstract: The creation of intelligent video game controllers has recently become one of the greatest challenges in game artificial intelligence research, and it is arguably one of the fastest-growing areas in game design and development. The learning process, a very important feature of intelligent methods, is the result of an intelligent game controller to determine and control the game objects behaviors' or actions autonomously. Our approach is to use a more efficient learning model in the form of artificial neural networks for training the controllers. We propose a Hill-Climbing Neural Network (HillClimbNet) that controls the movement of the Ms. Pac-man agent to travel around the maze, gobble all of the pills and escape from the ghosts in the maze. HillClimbNet combines the hill-climbing strategy with a simple, feed-forward artificial neural network architecture. The aim of this study is to analyze the performance of various activation functions for the purpose of generating neural-based controllers to play a video game. Each non-linear activation function is applied identically for all the nodes in the network, namely log-sigmoid, logarithmic, hyperbolic tangent-sigmoid and Gaussian. In general, the results shows an optimum configuration is achieved by using log-sigmoid, while Gaussian is the worst activation function.

Proceedings ArticleDOI
14 Sep 2014
TL;DR: With AdaBoost the authors achieved competitive results, whereas with the neural networks they were able to outperform baseline SVM scores in both Sub-Challenges.
Abstract: The Interspeech ComParE 2014 Challenge consists of two machine learning tasks, which have quite a small number of examples. Due to our good results in ComParE 2013, we considered AdaBoost a suitable machine learning meta-algorithm for these tasks, besides we also experimented with Deep Rectifier Neural Networks. These differ from traditional neural networks in that the former have several hidden layers, and use rectifier neurons as hidden units. With AdaBoost we achieved competitive results, whereas with the neural networks we were able to outperform baseline SVM scores in both Sub-Challenges. Index Terms :s peech technology, AdaBoost, deep neural networks, rectifier activation function

Journal ArticleDOI
TL;DR: It is shown that the proposed approach can be applied to represent dynamics of any batch/semibatch process and is found that the third configuration is exhibiting comparable or better performance over the other two configurations while requiring much smaller number of parameters.
Abstract: A neural network architecture incorporating time dependency explicitly, proposed recently, for modeling nonlinear nonstationary dynamic systems is further developed in this paper, and three alternate configurations are proposed to represent the dynamics of batch chemical processes. The first configuration consists of L subnets, each having M inputs representing the past samples of process inputs and output; each subnet has a hidden layer with polynomial activation function; the outputs of the hidden layer are combined and acted upon by an explicitly time-dependent modulation function. The outputs of all the subnets are summed to obtain the output prediction. In the second configuration, additional weights are incorporated to obtain a more generalized model. In the third configuration, the subnets are eliminated by incorporating an additional hidden layer consisting of L nodes. Backpropagation learning algorithm is formulated for each of the proposed neural network configuration to determine the weights, the polynomial coefficients, and the modulation function parameters. The modeling capability of the proposed neural network configuration is evaluated by employing it to represent the dynamics of a batch reactor in which a consecutive reaction takes place. The results show that all the three time-varying neural networks configurations are able to represent the batch reactor dynamics accurately, and it is found that the third configuration is exhibiting comparable or better performance over the other two configurations while requiring much smaller number of parameters. The modeling ability of the third configuration is further validated by applying to modeling a semibatch polymerization reactor challenge problem. This paper illustrates that the proposed approach can be applied to represent dynamics of any batch/semibatch process.

Journal ArticleDOI
TL;DR: In this article, Radial Basis Function (RBF) networks were used for the induction motor fault detection and the rotor faults were analyzed and fault symptoms were described, and the main stages of the design methodology of the RBF-based neural detectors were described.
Abstract: This paper deals with the application of the Radial Basis Function (RBF) networks for the induction motor fault detection. The rotor faults are analysed and fault symptoms are described. Next the main stages of the design methodology of the RBF-based neural detectors are described. These networks are trained and tested using measurement data of the stator current (MCSA). The efficiency of developed RBF-NN detectors is evaluated. Furthermore, influence of neural networks complexity and parameters of the RBF activation function on the quality of data classification is shown. The presented neural detectors are tested with measurement data obtained in the laboratory setup containing the converter-fed induction motor (IM) and changeable rotors with a different degree of damages.

01 Jan 2014
TL;DR: To improve deep learning architectures an analysis is given of the activation values in four dierent architectures using various activation functions, and an overview of the components in deep convolutional neural networks is given.
Abstract: In computer vision many tasks are solved using machine learning. In the past few years, state of the art results in computer vision have been achieved using deep learning. Deeper machine learning architectures are better capable in handling complex recognition tasks, compared to previous more shallow models. Many architectures for computer vision make use of convolutional neural networks which were modeled after the visual cortex. Currently deep convolutional neural networks are the state of the art in computer vision. Through a literature survey an overview is given of the components in deep convolutional neural networks. The role and design decisions for each of the components is presented, and the diculties involved in training deep neural networks are given. To improve deep learning architectures an analysis is given of the activation values in four dierent architectures using various activation functions. Current state of the art classiers use dropout, max-pooling as well as the maxout activation function. New components may further improve the architecture by providing a better solution for the diminishing gradient problem.