scispace - formally typeset
Search or ask a question

Showing papers on "Activation function published in 2003"


Journal ArticleDOI
TL;DR: These results include several sufficient conditions for the global exponential stability of general neural networks with time-varying delays and without monotone, bounded, or continuously differentiable activation function.
Abstract: This brief presents new theoretical results on the global exponential stability of neural networks with time-varying delays and Lipschitz continuous activation functions. These results include several sufficient conditions for the global exponential stability of general neural networks with time-varying delays and without monotone, bounded, or continuously differentiable activation function. In addition to providing new criteria for neural networks with time-varying delays, these stability conditions also improve upon the existing ones with constant time delays and without time delays. Furthermore, it is convenient to estimate the exponential convergence rates of the neural networks by using the results.

252 citations


Journal ArticleDOI
TL;DR: Cosine radial basis functions are shown to be strong competitors to existing reformulated radial basis function models trained by gradient descent and feedforward neural networks with sigmoid hidden units.
Abstract: Presents a systematic approach for constructing reformulated radial basis function (RBF) neural networks, which was developed to facilitate their training by supervised learning algorithms based on gradient descent. This approach reduces the construction of radial basis function models to the selection of admissible generator functions. The selection of generator functions relies on the concept of the blind spot, which is introduced in the paper. The paper also introduces a new family of reformulated radial basis function neural networks, which are referred to as cosine radial basis functions. Cosine radial basis functions are constructed by linear generator functions of a special form and their use as similarity measures in radial basis function models is justified by their geometric interpretation. A set of experiments on a variety of datasets indicate that cosine radial basis functions outperform considerably conventional radial basis function neural networks with Gaussian radial basis functions. Cosine radial basis functions are also strong competitors to existing reformulated radial basis function models trained by gradient descent and feedforward neural networks with sigmoid hidden units.

130 citations


Journal ArticleDOI
TL;DR: A new three-term backpropagation algorithm is proposed in order to speed-up the weight adjusting process and generally out-performs the conventional algorithm in terms of convergence speed and the ability to escape from local minima.

118 citations


Journal ArticleDOI
TL;DR: Conditions based on local inhibition are derived that guarantee boundedness of some multistable networks, conditions are established for global attractivity, bounds on global attractive sets are obtained, and complete convergence conditions for the network are developed using novel energy-like functions.
Abstract: Multistability is a property necessary in neural networks in order to enable certain applications (e.g., decision making), where monostable networks can be computationally restrictive. This article focuses on the analysis of multistability for a class of recurrent neural networks with unsaturating piecewise linear transfer functions. It deals fully with the three basic properties of a multistable network: boundedness, global attractivity, and complete convergence. This article makes the following contributions: conditions based on local inhibition are derived that guarantee boundedness of some multistable networks, conditions are established for global attractivity, bounds on global attractive sets are obtained, complete convergence conditions for the network are developed using novel energy-like functions, and simulation examples are employed to illustrate the theory thus developed.

89 citations


Journal ArticleDOI
TL;DR: It is shown that RBFs are not required to be integrable for the REF networks to be universal approximators, and can uniformly approximate any continuous function on a compact set provided that the radial basis activation function is continuous almost everywhere, locally essentially bounded, and not a polynomial.

86 citations


Journal ArticleDOI
TL;DR: By constructing a new Lyapunov functional, and using M-matrix and topological degree tool, problem of the global asymptotic stability (GAS) is discussed for a class of recurrent neural networks with time-varying delays and some simple and new sufficient conditions are obtained ensuring existence, uniqueness of the equilibrium point and its GAS of the neural networks.

83 citations


Journal ArticleDOI
TL;DR: A modified hyperbolic tangent function is used as the activation function of an auto-tuning neuron, which provides two adjustable parameters to flexibly determine the magnitude and the shape of function.

82 citations


Journal ArticleDOI
TL;DR: The main objective of this study is to compare sigmoid, tangent hyperbolic, and linear activation functions through the one- and two-hidden layered MLP neural network structures trained with the scaled conjugate gradient learning algorithm, and to evaluate their performance on the multispectral Landsat TM imagery classification problem.
Abstract: Neural networks, recently applied to a number of image classification problems, are computational systems consisting of neurons or nodes arranged in layers with interconnecting links. Although there are a wide range of network types and possible applications in remote sensing, most attention has focused on the use of MultiLayer Perceptron (MLP) or FeedForward (FF) networks trained with a backpropagation-learning algorithm for supervised classification. One of the main characteristic elements of an artificial neural network (ANN) is the activation function. Nonlinear logistic (sigmoid and tangent hyperbolic) and linear activation functions have been used effectively with MLP networks for various purposes. The main objective of this study is to compare sigmoid, tangent hyperbolic, and linear activation functions through the one- and two-hidden layered MLP neural network structures trained with the scaled conjugate gradient learning algorithm, and to evaluate their performance on the multispectral Landsat TM imagery classification problem.

73 citations


Book ChapterDOI
26 Jun 2003
TL;DR: The properties of activation functions from the standpoint of existence of an energy function for complex-valued neural networks are discussed and how to find out complex functions which satisfy the properties is discussed.
Abstract: Recently models of neural networks that can directly deal with complex numbers, complex-valued neural networks, have been proposed and several studies on their abilities of information processing have been done. One of the important factors to characterize behavior of a complex-valued neural network is its activation function which is a nonlinear complex function. This paper discusses the properties of activation functions from the standpoint of existence of an energy function for complex-valued neural networks. Two classes of complex functions which are widely used as activation functions in the models of complex-valued neural networks are considered. We investigate the properties of activation functions which assure existence of energy functions and discuss about how to find out complex functions which satisfy the properties.

73 citations


Journal ArticleDOI
Hiroomi Hikawa1
TL;DR: Experimental results show that the piecewise-linear function of the proposed neuron is programmable and robust against the change in the number of input signals, and the convergence rate of the learning and generalization capability are improved.
Abstract: This paper proposes a new type of digital pulse-mode neuron that employs piecewise-linear function as its activation function. The neuron is implemented on field programmable gate array (FPGA) and tested by experiments. As well as theoretical analysis, the experimental results show that the piecewise-linear function of the proposed neuron is programmable and robust against the change in the number of input signals. To demonstrate the effect of piecewise-linear activation function, pulse-mode multilayer neural network with on-chip learning is implemented on FPGA with the proposed neuron, and its learning performance is verified by experiments. By approximating the sigmoid function by the piecewise-linear function, the convergence rate of the learning and generalization capability are improved.

73 citations


Journal ArticleDOI
TL;DR: It is proved that recurrent networks with contractive transition function with a fixed contraction parameter fulfill the so-called distribution independent uniform convergence of empirical distances property and hence, unlike general recurrent networks, are distribution independent PAC learnable.
Abstract: Recent experimental studies indicate that recurrent neural networks initialized with "small" weights are inherently biased toward definite memory machines (Tino, Cernansky, & Benuskova, 2002a, 2002b). This article establishes a theoretical counterpart: transition function of recurrent network with small weights and squashing activation function is a contraction. We prove that recurrent networks with contractive transition function can be approximated arbitrarily well on input sequences of unbounded length by a definite memory machine. Conversely, every definite memory machine can be simulated by a recurrent network with contractive transition function. Hence, initialization with small weights induces an architectural bias into learning with recurrent neural networks. This bias might have benefits from the point of view of statistical learning theory: it emphasizes one possible region of the weight space where generalization ability can be formally proved. It is well known that standard recurrent neural networks are not distribution independent learnable in the probably approximately correct (PAC) sense if arbitrary precision and inputs are considered. We prove that recurrent networks with contractive transition function with a fixed contraction parameter fulfill the so-called distribution independent uniform convergence of empirical distances property and hence, unlike general recurrent networks, are distribution independent PAC learnable.

Journal ArticleDOI
TL;DR: The proposed method of power transformer protection is evaluated using simulation performed with EMTP package, and the proposed model requires less training time and is more accurate in prediction as compared to FFNN.

Journal ArticleDOI
TL;DR: A novel fuzzy-based activation function for artificial neural networks is proposed that provides easy hardware implementation and straightforward interpretability in the basis of IF-THEN rules.
Abstract: A novel fuzzy-based activation function for artificial neural networks is proposed. This approach provides easy hardware implementation and straightforward interpretability in the basis of IF-THEN rules. Backpropagation learning with the new activation function also has low computational complexity. Several application examples ( XOR gate, chaotic time-series prediction, channel equalization, and independent component analysis) support the potential of the proposed scheme.

Journal ArticleDOI
TL;DR: A new Expectation-Maximization (EM) algorithm which speeds up the training of feedforward networks with local activation functions such as the Radial Basis Function (RBF) network by applying a soft decomposition of the residual among the units.

Journal ArticleDOI
Hiroomi Hikawa1
TL;DR: A new pulse-mode digital neuron which is based on voting neuron is described, which provides adjustable nonlinear function, which resembles the sigmoid function.
Abstract: This paper describes a new pulse-mode digital neuron which is based on voting neuron. The signal level of the neuron is represented by frequency of pulse signals. The proposed neuron provides adjustable nonlinear function, which resembles the sigmoid function. The proposed neuron and experimental multilayer neural network (MNN) are implemented on field programmable gate array (FPGA) and various experiments are conducted to test the performance of the proposed system. The experimental results show that the proposed neuron has rigid adjustable nonlinear function.

Journal ArticleDOI
TL;DR: The functions defined are shown to satisfy the requirements of the universal approximation theorem(s) and the envelope of the derivatives of the members of the defined class is shown to be sigmoidal.

Proceedings ArticleDOI
15 Oct 2003
TL;DR: An improved version of the normal k-means clustering algorithm to select the hidden layer neurons of a radial basis function (RBF) neural network that has been modified to capture more knowledge about the distribution of input patterns and to take care of hyper-ellipsoidal shaped clusters.
Abstract: We propose an improved version of the normal k-means clustering algorithm to select the hidden layer neurons of a radial basis function (RBF) neural network. The normal k-means algorithm has been modified to capture more knowledge about the distribution of input patterns and to take care of hyper-ellipsoidal shaped clusters. The RBF neural network with the proposed algorithm has been tested with three different machine-learning data sets. The average recognition rate of an RBF neural network over these data sets has been found to be 93.70% using the proposed improved k-means algorithm, whereas in the method using the normal k-means algorithm, the corresponding value is found to be 88.12%. Clearly, the results show that the performance of the RBF neural network using the proposed modified k-means algorithm has been improved.

Journal ArticleDOI
TL;DR: The MLP model is reformulated with the original perceptron in mind so that each node in the “hidden layers” can be considered as a latent Bernoulli random variable, and the likelihood for the reformulated latent variable model is constructed by standard finite mixture ML methods using an EM algorithm.
Abstract: Multi-layer perceptrons (MLPs), a common type of artificial neural networks (ANNs), are widely used in computer science and engineering for object recognition, discrimination and classification, and have more recently found use in process monitoring and control “Training” such networks is not a straightforward optimisation problem, and we examine features of these networks which contribute to the optimisation difficulty Although the original “perceptron”, developed in the late 1950s (Rosenblatt 1958, Widrow and Hoff 1960), had a binary output from each “node”, this was not compatible with back-propagation and similar training methods for the MLP Hence the output of each node (and the final network output) was made a differentiable function of the network inputs We reformulate the MLP model with the original perceptron in mind so that each node in the “hidden layers” can be considered as a latent (that is, unobserved) Bernoulli random variable This maintains the property of binary output from the nodes, and with an imposed logistic regression of the hidden layer nodes on the inputs, the expected output of our model is identical to the MLP output with a logistic sigmoid activation function (for the case of one hidden layer) We examine the usual MLP objective function—the sum of squares—and show its multi-modal form and the corresponding optimisation difficulty We also construct the likelihood for the reformulated latent variable model and maximise it by standard finite mixture ML methods using an EM algorithm, which provides stable ML estimates from random starting positions without the need for regularisation or cross-validation Over-fitting of the number of nodes does not affect this stability This algorithm is closely related to the EM algorithm of Jordan and Jacobs (1994) for the Mixture of Experts model We conclude with some general comments on the relation between the MLP and latent variable models

Journal ArticleDOI
TL;DR: This proposed method effectively enhances the learning abilities of multilayer feedforward neural networks by introducing the sum-of-squares weight term into the networks’ error functions and appropriately enlarging the variable components with the help of the SVM theory.

Patent
Meng Zhuo1, Pao Yoh-Han1
15 Apr 2003
TL;DR: In this article, a function approximation node is incrementally added to the neural net model and function parameters of other nodes in the neural network model are updated by using the function parameter of the other nodes prior to the addition of the function approximation.
Abstract: Method of incrementally forming and adaptively updating a neural net model are provided. A function approximation node is incrementally added to the neural net model. Function parameters for the function approximation node are determined and function parameters of other nodes in the neural network model are updated, by using the function parameters of the other nodes prior to addition of the function approximation node to the neural network model.

Journal ArticleDOI
TL;DR: The proposed method improves considerably on the performance of the general backpropagation algorithm, including when using momentum, by means of a fuzzy logic system for automatically tuning the activation function gain.

Journal ArticleDOI
TL;DR: The πt-neuron solution to the N-bit parity problem has the lowest computational cost among the neural solutions presented to date.
Abstract: A solution to the N-bit parity problem employing a single multiplicative neuron model, called translated multiplicative neuron (πt-neuron), is proposed. The πt-neuron presents the following advantages: (a) ∀N≥1, only 1 πt-neuron is necessary, with a threshold activation function and parameters defined within a specific interval; (b) no learning procedures are required; and (c) the computational cost is the same as the one associated with a simple McCulloch-Pitts neuron. Therefore, the πt-neuron solution to the N-bit parity problem has the lowest computational cost among the neural solutions presented to date.

Journal ArticleDOI
TL;DR: It is shown that the general approximation property of feed-forward multilayer perceptron networks can be achieved in networks where the number of nodes in each layer is bounded, but the numberof layers grows to infinity.

Journal ArticleDOI
TL;DR: It is demonstrated that all members of the proposed class(es) satisfy the requirements to act as an activation function in feedforward artificial neural networks.
Abstract: The role of activation functions in feedforward artificial neural networks has not been investigated to the desired extent. The commonly used sigmoidal functions appear as discrete points in the sigmoidal functional space. This makes comparison difficult. Moreover, these functions can be interpreted as the (suitably scaled) integral of some probability density function (generally taken to be symmetric/bell shaped). Two parameterization methods are proposed that allow us to construct classes of sigmoidal functions based on any given sigmoidal function. The suitability of the members of the proposed class is investigated. It is demonstrated that all members of the proposed class(es) satisfy the requirements to act as an activation function in feedforward artificial neural networks.

Proceedings ArticleDOI
01 Jan 2003
TL;DR: Based on a general class of activation functions, new results guaranteeing the global exponential stability of the equilibrium for a class of recurrent neural networks with variable delays are obtained.
Abstract: Based on a general class of activation functions, new results guaranteeing the global exponential stability of the equilibrium for a class of recurrent neural networks with variable delays are obtained The delayed Hopfield neural network and bidirectional associative memory network and cellular neural networks are special cases of the network model considered in this paper In addition, we do not require the activation functions to be differentiable, bounded and monotone nondecreasing So this work gives some improvements to the previous ones

Journal ArticleDOI
TL;DR: A new parallel hardware architecture dedicated to compute the Gaussian potential function is proposed, which reduces computational delay in the output function and also computes the exponential function and its exponent at the same time.

Journal ArticleDOI
TL;DR: Neural networks based on an adaptive nonlinear function suitable for both blind complex time domain signal separation and blind frequency domain signal deconvolution, are presented.
Abstract: In this paper, neural networks based on an adaptive nonlinear function suitable for both blind complex time domain signal separation and blind frequency domain signal deconvolution, are presented. This activation function, whose shape is modified during learning, is based on a couple of spline functions, one for the real and one for the imaginary part of the input. The shape control points are adaptively changed using gradient-based techniques. B-splines are used, because they allow to impose only simple constraints on the control parameters in order to ensure a monotonously increasing characteristic. This new adaptive function is then applied to the outputs of a one-layer neural network in order to separate complex signals from mixtures by maximizing the entropy of the function outputs. We derive a simple form of the adaptation algorithm and present some experimental results that demonstrate the effectiveness of the proposed method.

Book ChapterDOI
01 Jan 2003
TL;DR: In this article, the radial basis function (RBF) neural networks have been studied in various fields such as signal modeling, nonlinear time series prediction, identification of dynamic systems, pattern recognition, and knowledge discovery.
Abstract: Essential theory and main applications of feed-forward connectionist structures termed radial basis function (RBF) neural networks are given. Universal approximation and Cover’s theorems are outlined that justify powerful RBF network capabilities in function approximation and data classification tasks. The methods for regularising RBF generated mappings are addressed also. Links of these networks to kernel regression methods, density estimation, and nonlinear principal component analysis are pointed out. Particular attention is put on discussing different RBF network training schemes, e.g. the constructive method incorporating orthogonalisation of RBF kernels. Numerous, successful RBF networks applications in diverse fields such as signal modelling, non-linear time series prediction, identification of dynamic systems, pattern recognition, and knowledge discovery are outlined.

Proceedings ArticleDOI
20 Jul 2003
TL;DR: A comparison indicates that both neural network models exhibit comparable performance when tested on the training data but cosine RBF neural networks generalize better since they outperform considerably FFNNs on the testing data.
Abstract: This paper presents the results of a study aimed at the development of a system for short-term electric power load forecasting. This was attempted by training feedforward neural networks (FFNNs) and cosine radial basis function (RBF) neural networks to predict future power demand based on past power load data and weather conditions. This comparison indicates that both neural network models exhibit comparable performance when tested on the training data but cosine RBF neural networks generalize better since they outperform considerably FFNNs on the testing data.

01 May 2003
TL;DR: RBFNs' interpretations can suggest applications that are particularly interesting in medical domains, and a survey of their interpretations and of their correspond- ing learning algorithms is provided.
Abstract: Medical applications usually used Radial Basis Function Networks just as Artifl- cial Neural Networks. However, RBFNs are Knowledge-Based Networks that can be interpreted in several way: Artiflcial Neural Networks, Regularization Networks, Support Vector Machines, Wavelet Networks, Fuzzy Controllers, Kernel Estimators, Instanced-Based Learners. A survey of their interpretations and of their correspond- ing learning algorithms is provided as well as a brief survey on dynamic learning algorithms. RBFNs' interpretations can suggest applications that are particularly interesting in medical domains.