scispace - formally typeset
Search or ask a question

Showing papers on "Activation function published in 1992"


Posted Content
TL;DR: In this article, it was shown that a standard multilayer feedforward network with a locally bounded piecewise activation function can approximate any continuous function to any degree of accuracy if and only if the network's activation function is not a polynomial.
Abstract: Several researchers characterized the activation function under which multilayer feedforwardnetworks can act as universal approximators. We show that most of all the characterizationsthat were reported thus far in the literature are special cases of the followinggeneral result: a standard multilayer feedforward network with a locally bounded piecewisecontinuous activation function can approximate any continuous function to any degree ofaccuracy if and only if the network's activation function is not a polynomial. We alsoemphasize the important role of the threshold, asserting that without it the last theoremdoes not hold.

216 citations


Proceedings ArticleDOI
16 Dec 1992
TL;DR: Simulation results are presented to demonstrate that the methods presented can be used for the effective control of complex nonlinear systems and it is shown that globally stable adaptive controllers can be determined.
Abstract: Some of the problems that arise in the control of nonlinear systems in the presence of uncertainty are considered Multilayer neural networks and radial basis function networks are used in the design of identifiers and controllers, and gradient methods are used to adjust their parameters For a restricted class of nonlinear systems, it is shown that globally stable adaptive controllers can be determined Simulation results are presented to demonstrate that the methods presented can be used for the effective control of complex nonlinear systems >

197 citations


Proceedings ArticleDOI
Sherif Hashem1
07 Jun 1992
TL;DR: A method for computing the network output sensitivities with respect to variations in the inputs for multilayer feedforward artificial neural networks with differentiable activation functions is presented.
Abstract: A method for computing the network output sensitivities with respect to variations in the inputs for multilayer feedforward artificial neural networks with differentiable activation functions is presented. It is applied to obtain expressions for the first- and second-order sensitivities. An example is introduced along with a discussion to illustrate how the sensitivities are calculated and to show how they compare to the actual derivatives of the function being modeled by the neural network. >

118 citations


Journal ArticleDOI
TL;DR: It is shown how to construct a perceptron with two hidden layers for multivariate function approximation, which can perform function approximation in the same manner as networks based on Gaussian potential functions, by linear combination of local functions.
Abstract: Mathematical theorems establish the existence of feedforward multilayered neural networks, based on neurons with sigmoidal transfer functions, that approximate arbitrarily well any continuous multivariate function. However, these theorems do not provide any hint on how to find the network parameters in practice. It is shown how to construct a perceptron with two hidden layers for multivariate function approximation. Such a network can perform function approximation in the same manner as networks based on Gaussian potential functions, by linear combination of local functions. >

100 citations


Journal ArticleDOI
TL;DR: In this article, a stepwise regression algorithm based on orthogonalization and a series of statistical tests is employed for designing and training of the RBF networks, which yields non-linear models, which are stable and linear in the model parameters.

94 citations


Journal ArticleDOI
TL;DR: Simulation results on the uses of the second-order function and the bipolar sigmoid function for training multilayer feedforward networks using the backpropagation algorithm show that they have similar generalisation properties while the second order function has a slight advantage in convergence speed.
Abstract: A simple sigmoid-like second-order piecewise activation function suitable for direct digital hardware implementation is presented. Simulation results on the uses of the second-order function and the bipolar sigmoid function for training multilayer feedforward networks using the backpropagation algorithm show that they have similar generalisation properties while the second-order function has a slight advantage in convergence speed.< >

87 citations


Proceedings Article
30 Nov 1992
TL;DR: This work compares activation functions in terms of the approximation power of their feedforward nets in the case of analog as well as boolean input.
Abstract: We compare activation functions in terms of the approximation power of their feedforward nets. We consider the case of analog as well as boolean input.

76 citations


Journal ArticleDOI
TL;DR: A modification of the generalized delta rule is described that is capable of training multilayer networks of value units, i.e. units defined by a particular non-monotonic activation function, the Gaussian, which suggests that value unit networks may be better suited for learning some pattern classification tasks and for answering general questions related to the organization of neurophysiological systems.
Abstract: A modification of the generalized delta rule is described that is capable of training multilayer networks of value units, i.e. units defined by a particular non-monotonic activation function, the Gaussian, For simple problems of pattern classification, this rule produces networks with several advantages over standard feedforward networks: they require fewer processing units and can be trained much more quickly. Though superficially similar, there are fundamental differences between the networks trained by this new learning rule and radial basis function networks. These differences suggest that value unit networks may be better suited for learning some pattern classification tasks and for answering general questions related to the organization of neurophysiological systems.

61 citations


Proceedings ArticleDOI
08 Mar 1992
TL;DR: The authors proposed an architecture of multilayer feedforward neural networks for classification problems of fuzzy vectors where the activation function is extended to a fuzzy input-output relation by the extension principle.
Abstract: The authors proposed an architecture of multilayer feedforward neural networks for classification problems of fuzzy vectors. A fuzzy input vector is mapped to a fuzzy number by the proposed neural network where the activation function is extended to a fuzzy input-output relation by the extension principle. A learning algorithm is derived from a cost function defined by a target output and the level set of a fuzzy output. The proposed classification method of fuzzy vectors is illustrated by a numerical example. >

56 citations


Journal ArticleDOI
01 May 1992
TL;DR: A connectionist approach to the problem of PID autotuning is proposed, based on integral measures of the step response, which gives a major reduction in the number of iterations needed to achieve a local minimum.
Abstract: A connectionist method for autotuning PID controllers is proposed. This technique, which is applicable both in open and in closed loops, employs multilayer perceptrons to approximate the mappings between the identification measures of the plant and the optimal PID values. The neutral network controller is designed to adapt to changing system structures and parameter values online. To achieve this objective, the network weighting coefficients are determined during an offline training phase. Simulation results are presented to illustrate the properties of the controller. One of the important aspects of neural networks is the convergence characteristic of this training phase. In the proposed approach, multilayer perceptrons are employed for nonlinear function approximation. As a consequence, the neurons have a linear activation function in their output layer. It is shown that a new learning criterion can be defined for this class of multilayer perceptrons, which is commonly found in control systems applications. Comparisons of the standard and the reformulated criteria, using different training algorithms, show that the new formulation achieves a significant reduction in the number of iterations needed to converge to a local minimum.

51 citations


Proceedings ArticleDOI
24 Jun 1992
TL;DR: In this article, the radial basis function networks can be used to produce stable, convergent, recursive identifiers in both continuous and discrete time, and the latter is of particular interest as it can serve as a model of the general neural network functional learning process.
Abstract: The methodology developed for adaptive control applications of radial basis function networks can easily also be used to produce stable, convergent, recursive identifiers, in both continuous and discrete time. The latter is of particular interest as it can serve as a model of the general neural network functional learning process, and hence gives some direct insights into the factors influencing the success of these methods.

Journal ArticleDOI
01 May 1992
TL;DR: A number of different implementations for the first derivative of the sigmoid function are proposed based on overall speed performance (circuit speed and training time) and hardware requirements.
Abstract: This paper proposes a number of different implementations for the first derivative of the sigmoid function. The implementation of the sigmoid function employs a powers-of-two piecewise linear approximation. The best implementation scheme for the derivative is suggested based on overall speed performance (circuit speed and training time) and hardware requirements.

01 Apr 1992
TL;DR: For normalized inputs, multilayer perceptron networks are radial function networks albeit wth a non-standard radial function).
Abstract: Both multilayer perceptrons (MLP) and Generalized Radial Basis Functions GRBF) have good approximation properties, theoretically and experimentally. Are they related? The main point of this paper 'is to show that for normalized inputs, multilayer perceptron networks are radial function networks albeit wth a non-standard radial function). This provides an interpretation of the weights u7 as centers t of the radial function network, and therefore as equivalent to templates. This 'Insight may be useful for practical applications, ncluding better 'Initialization procedures for MLP. In the remainder of the paper, we discuss the relation between the radial functions that correspond to the sigmoid for normalized inputs and well-behaved radial basis functions, such as the Gaussian. In particular, we observe that the radial function associated with the sigmoid 'is an activation function that is good approximation to Gaussian basis functions for a range of values of the bias parameter. The mplication is that a MLP network can always simulate a Gaussian GRBF network (with the same nmber of uits but less parameters); the converse is true oly for certain values of the bias parameter. Numerical experiments indicate that this constraint 'is not always satisfied in practice by MLP networks trained with backpropagation. Multiscale RBF networks, on the other hand, can approximate MLP networks with a smilar number of parameters.

Journal ArticleDOI
TL;DR: It was shown that the partial differential coefficients of output strength with respect to input parameters were useful to analyze the relationship between inputs and outputs in the neural network and characteristics of each data in the set of data.
Abstract: The operation of the perceptron-type neural network can be regarded as a function which transforms an input vector to another (output) vector. We have presented the analytical formula for the partial derivative of this function with respect to the elements of the input vector. Using numerical data, we have examined the accuracy, independency of the elements of the input vector, and recognition ability of a function in mixed functions. It was shown that the partial differential coefficients of output strength with respect to input parameters were useful to analyze the relationship between inputs and outputs in the neural network and characteristics of each data in the set of data.

Proceedings ArticleDOI
07 Jun 1992
TL;DR: The authors propose the multidendrite multiactivation product unit and the vectorial connection model for artificial neural networks and an optimal weight initialization algorithm is developed for a three-layer network with hidden units of 2D vectorial connections.
Abstract: The authors propose the multidendrite multiactivation product unit and the vectorial connection model for artificial neural networks. A generalized backpropagation learning rule is also developed for multilayer feedforward networks with a new neuron model and connections. Each hidden neuron is a multiactivation product unit which requires vectorial axon connections and a productive activation function. An optimal weight initialization algorithm is developed for a three-layer network with hidden units of 2D vectorial connections. The weights between the input layer and the hidden layer are derived from the feature selection methods used in pattern recognition. The activation function is the product of a 2D Hermite spline base function. The weights between the hidden layer and the third layer are scaled coefficients of the 2D Hermite spline interpolations. The performances of networks initialized by the new algorithm are compared with those obtained by selecting random initial weights. >

Journal ArticleDOI
TL;DR: A design method for multilayer feedforward neural networks with simplified sigmoid activation functions and one-powers-of-two weights is proposed that can retain a nearly identical generalisation capability of the corresponding network using continuous weights while having increased computational speed in applications and reduced cost in digital hardware implementation.
Abstract: A design method for multilayer feedforward neural networks with simplified sigmoid activation functions and one-powers-of-two weights is proposed. The designed multilayer feed-forward neural network can retain a nearly identical generalisation capability of the corresponding network using continuous weights, while having increased computational speed in applications and reduced cost in digital hardware implementation.

Journal ArticleDOI
TL;DR: This paper defines appropriate classes of feedforward neural networks with specified fan-in, accuracy of computation and depth and using techniques of communication complexity proceed to show that the classes fit into a well-studied hierarchy of boolean circuits.

Journal ArticleDOI
TL;DR: A massively parallel implementation of a linear version of the neural technique on the Associative String Processor (ASP) machine and promising results are shown in terms of learning speed and quality of the reconstructed images.
Abstract: In this paper a neural autoassociative technique applied to image compression is presented. Particular attention is devoted to the preprocessing stage. The validity of some of the already established theoretical results is discussed and an experimental study of the mapping capabilities of the network based on a nonlinear parametrized activation function is presented. In order to test the image reconstruction capabilities of the neural technique, comparisons with more traditional image processing tools such as Karhunen-Loeve Transform (KLT) are shown. A massively parallel implementation of a linear version of the neural technique on the Associative String Processor (ASP) machine is presented. Despite the linear structure of the ASP and the use of fixed arithmetic for the implementation, promising results are shown in terms of learning speed (of the order of 109 connections per second) and quality of the reconstructed images.

Proceedings ArticleDOI
07 Jun 1992
TL;DR: A second-order multilayer perceptron that uses a different activation function, the quadratic sigmoid function, and a learning algorithm is developed based on this new activation function to approximate continuous-valued functions.
Abstract: A second-order multilayer perceptron that uses a different activation function, the quadratic sigmoid function, is proposed. Unlike the conventional sigmoid activation function, the quadratic sigmoid function exhibits second-order characteristics among the input components. Based on this new activation function, a learning algorithm is developed for the new multilayer perceptron. The proposed multilayer perceptron has been used to approximate continuous-valued functions. The approximation results show that the learning speed and the network size were significantly improved in comparison with the conventional multilayer perceptrons which use the sigmoid activation functions. >

Patent
Kazuyuki Shiomi1, Sei Watanabe1
18 Feb 1992
TL;DR: In this paper, the authors proposed to use the characteristic data for determining the characteristics of the transfer functions (for example, sigmoid functions) of the neurons of the hidden layer and the output layer of a neural network.
Abstract: The characteristic data for determining the characteristics of the transfer functions (for example, sigmoid functions) of the neurons of the hidden layer and the output layer (the gradients of the sigmoid functions) of a neural network are learned and corrected in a manner similar to the correction of weighting data and threshold values. Since at least one characteristic data which determines the characteristics of the transfer function of each neuron is learned, the transfer function characteristics can be different for different neurons in the network independently of the problem and/or the number of neurons, and be optimum. Accordingly, a learning with high precision can be performed in a short time.

Proceedings Article
30 Nov 1992
TL;DR: This paper gives a polynomial time algorithm that PAC learns these networks under the uniform distribution and suggests that, under reasonable distributions, µ-perceptron networks may be easier to learn than fully connected networks.
Abstract: Neural networks with binary weights are very important from both the theoretical and practical points of view. In this paper, we investigate the learnability of single binary perceptrons and unions of µ-binary-perceptron networks, i.e. an "OR" of binary perceptrons where each input unit is connected to one and only one perceptron. We give a polynomial time algorithm that PAC learns these networks under the uniform distribution. The algorithm is able to identify both the network connectivity and the weight values necessary to represent the target function. These results suggest that, under reasonable distributions, µ-perceptron networks may be easier to learn than fully connected networks.

PatentDOI
Bernhard E. Boser1
TL;DR: In this paper, a hyperbolic tangent function is replaced by a piecewise linear threshold logic function in a neural network and a similar but less complex nonlinear function in each neuron or computational element after each neuron has been trained by an appropriate training algorithm for the classifying problem addressed by the neural network.
Abstract: Higher operational speed is obtained without sacrificing computational accuracy and reliability in a neural network by interchanging a computationally complex nonlinear function with a similar but less complex nonlinear function in each neuron or computational element after each neuron of the network has been trained by an appropriate training algorithm for the classifying problem addressed by the neural network. In one exemplary embodiment, a hyperbolic tangent function is replaced by a piecewise linear threshold logic function.

Proceedings ArticleDOI
24 Jun 1992
TL;DR: In this paper, Radial basis function networks are compared with sigmoidal activation function feedforward networks using data from a large industrial process and the contribution that RBF networks can make to the process modelling and control toolbox is examined.
Abstract: There are strong relationships between radial basis function (RBF) approaches and neural network representations. Indeed, the RBF representation can be implementated in the form of a two-layered network. This paper examines the contribution that RBF networks can make to the process modelling and control toolbox. Radial basis function networks are compared with sigmoidal activation function feedforward networks using data from a large industrial process.

Patent
Juergen Hollatz1, Volker Tresp1
11 Sep 1992
TL;DR: In this article, the neural network is pre-structural in a given network configuration using rule-base knowledge, and each initial value is defined by a normalised linear summation function in terms of weighted and unweighted base function.
Abstract: The neural network is pre-structural in a given network configuration using rule-base knowledge. Each initial value is defined by a normalised linear summation function in terms of weighted and unweighted base function. The latter is obtained from the mean values of localised positive function of the neural network input values.Pref. the base function is obtained via an exponential function of an inverse covariance matrix, multiplied by a normalisation factor for the base function.

Proceedings ArticleDOI
TL;DR: A variety of artificial neural networks are evaluated for their classification abilities under noisy inputs, including feedforward networks, localized basis function networks, and exemplar classifiers.
Abstract: A variety of artificial neural networks are evaluated for their classification abilities under noisy inputs. These networks include feedforward networks, localized basis function networks, and exemplar classifiers. The performance of radial basis function classifiers deteriorate rapidly in the presence of noise, but elliptical basis variants are able to adapt to extraneous input components quite robustly. For feedforward networks, selective pruning of weights based on an `optimal brain damage' approach helps in noise-tolerant classification. Results from a radar classification problem are presented.

Proceedings ArticleDOI
01 Jan 1992
TL;DR: An analysis is presented which suggests how the use of this function can improve convergence and generalization and tests on simulated data provide evidence of improved generalization with the log likelihood cost function.
Abstract: The log likelihood cost function is discussed as an alternative to the least-squares criterion for training feedforward neural networks. An analysis is presented which suggests how the use of this function can improve convergence and generalization. Tests on simulated data using both training algorithms provide evidence of improved generalization with the log likelihood cost function. >

Proceedings Article
12 Jul 1992
TL;DR: The improved algorithm guarantees that a global minimum is found in linear time for tree-like subnetworks and is self-stabilizing for trees (cycle-free undirected graphs) and remains correct under various scheduling demons.
Abstract: Symmetric networks that are based on energy minimization, such as Boltzmann machines or Hopfield nets, are used extensively for optimization, constraint satisfaction, and approximation of NP-hard problems. Nevertheless, finding a global minimum for the energy function is not guaranteed, and even a local minimum may take an exponential number of steps. We propose an improvement to the standard activation function used for such networks. The improved algorithm guarantees that a global minimum is found in linear time for tree-like subnetworks. The algorithm is uniform and does not assume that the network is a tree. It performs no worse than the standard algorithms for any network topology. In the case where there are trees growing from a cyclic subnetwork, the new algorithm performs better than the standard algorithms by avoiding local minima along the trees and by optimizing the free energy of these trees in linear time. The algorithm is self-stabilizing for trees (cycle-free undirected graphs) and remains correct under various scheduling demons. However, no uniform protocol exists to optimize trees under a pure distributed demon and no such protocol exists for cyclic networks under central demon.

Proceedings ArticleDOI
07 Jun 1992
TL;DR: The loading problem for a four-node neural network with a node function set equal to AC/sub 10/ plus the three-input equality function is shown to be NP complete, and it is indicated how the required results can be derived in a similar fashion.
Abstract: It is shown that the loading problem for a six-node neural network with a node function set AC/sub 10/, i.e., the conjunction or disjunction of a subset of the inputs or their complements is NP complete. It can be deduced from this observation that the loading problem for a six-node analog neural network is NP hard. The loading problem for a four-node neural network with a node function set equal to AC/sub 10/ plus the three-input equality function is shown to be NP complete, and it is indicated how the required results can be derived in a similar fashion. Three loading problems are studied. >

Proceedings ArticleDOI
09 Aug 1992
TL;DR: Computer simulation shows that the authors' network structure and design approach are valid and a sufficient stability criterion which can be realized by a redesign of the neuron function is given.
Abstract: A general structure of the cellular neural network is introduced. An analytical method is presented to find the weight matrix for a given set of desired vectors. An energy function is then constructed. Using the energy function, a sufficient stability criterion which can be realized by a redesign of the neuron function is given. Computer simulation shows that the authors' network structure and design approach are valid. >

01 Jan 1992
TL;DR: The mathematical framework for the development of Wave-Nets is presented and various aspects of their practical implementation are discussed and the problem of predicting a chaotic time-series is solved as an illustrative example.
Abstract: A novel artificial neural network with one hidden layer of nodes, whose basis functions are drawn from a family of orthonormal wavelets, is developed in this paper. Wavelet Networks or Wave-Nets are based on fm theoretical foundations of functional analysis. The good localization characteristics of the basis functions, both in the input and frequency domains, allow hierarchical, multi-resolution learning of input-output maps from experimental data. Furthermore, Wave-Nets allow explicit estimation of global and local prediction error-bounds, and thus lend themselves to a rigorous and transparent design of the network. Computational complexity arguments prove that the training and adaptation efficiency of Wave-Nets is at least an order of magnitude better than other networks. This paper presents the mathematical framework for the development of Wave-Nets and discusses various aspects of their practical implementation. The problem of predicting a chaotic time-series is solved as an illustrative example. Learning by artificial neural networks represents an expansion of the unknown nonlinear relationship between inputs, x and outputs, F(x), into a space spanned by the activation functions of the network’s node. Specifically, Poggio and Girosi (1989) have shown that learning by feedforward neural networks can be regarded as synthesizing an approximation of a multi-dimensional function, over a space spanned by the activations functions, ~XX, k), 1 = 1, 2, ..., m, where k are adjustable parameters, i.e. m I= 1 F(x) = CflI(X, k) (1) Using empirical data, the activation function parameters, and the network parameters, cp 1 = 1,2, ..., m, are adjusted in such a way as to minimize the approximation error. The solution to this nonlinear problem is often ad hoc, requiring trial and error, giving artificial neural networks a “black box” character. local, as in Radial Basis Function Networks (RBFN). Both networks are capable of approximating any continuous function with arbitrary accuracy, given enough nodes, but have different approximation properties. Adaptation and incremental learning with global approximators is slow due to the interaction of many nodes, and may not converge. They could also lead to large extrapolation errors without warning. These problems are overcome in neural networks with local activation functions. Improved understanding of the relationship between neural networks, approximation theory and functional analysis has prompted several researchers to look for better ways to design neural networks. From the theory of functional analysis it is well known that functions can be represented as a weighted sum of orthogonal basis functions. Such expansions can be easily represented as neural nets which can be designed for the desired error rate using the properties of orthonormal expansions, thus decreasing the ad hocness of neural net design. Unfortunately, most orthogonal functions are global approximators, and suffer from the disadvantages mentioned above. In order to take full advantage of orthonormality of basis functions, and localized learning, we need a set of basis functions which are local and orthogonal. properties. Such functions, belonging to the class of wavelets have been developed, and have found applications in several fields like signal processing and quantum physics (Daubechies, 1988; Mallat, 1989). In this paper we propose the development of neural networks with activation functions derived from various classes of orthogonal wavelets. The resulting -let Mwork, or Wave-Net has all the advantages of true localized learning. Furthermore, in most learning problems the training data are often non-uniformly distributed in the input space. An efficient way of solving such problems is by learning at multiple resolutions (Moody, 1989). A higher resolution of the input space may be used if data are dense, and a lower resolution where they are sparse. Wavelets, in addition to forming an orthogonal basis are also capable of explicitly representing the behavior of a function at various resolutions of input variables. Consequently, a Wave-Net is fwst trained to learn the mapping between inputs and outputs at the coarsest resolution of input values. Subsequently, it is trained to incorporate elements of the input-output mapping at higher resolutions of the input variables until the desired level of generalization has been reached. Such hierarchical, multi-resolution training has many attractive features for solving engineering problems, e.g. a meaningful interpretation of the resulting mapping, estimation of mapping errors both in local Two types of activation functions are commonly used; global, as in Backpropagation Networks (BPN), and It was believed until recently that it was not possible to build simple orthonormal bases with good localization