scispace - formally typeset
Search or ask a question

Showing papers on "Activation function published in 1990"


Journal ArticleDOI
TL;DR: A probabilistic neural network that can compute nonlinear decision boundaries which approach the Bayes optimal is formed, and a fourlayer neural network of the type proposed can map any input pattern to any number of classifications.

3,772 citations


Journal ArticleDOI
TL;DR: A shoulder strap retainer having a base to be positioned on the exterior shoulder portion of a garment with securing means attached to the undersurface of the base for removably securing the base to the exterior shoulders portion of the garment.

1,709 citations


Journal ArticleDOI
TL;DR: The multilayer perceptron, when trained as a classifier using backpropagation, is shown to approximate the Bayes optimal discriminant function.
Abstract: The multilayer perceptron, when trained as a classifier using backpropagation, is shown to approximate the Bayes optimal discriminant function. The result is demonstrated for both the two-class problem and multiple classes. It is shown that the outputs of the multilayer perceptron approximate the a posteriori probability functions of the classes being trained. The proof applies to any number of layers and any type of unit activation function, linear or nonlinear. >

866 citations


Proceedings ArticleDOI
17 Jun 1990
TL;DR: It is shown that feedforward networks having bounded weights are not undesirable restricted, but are in fact universal approximators, provided that the hidden-layer activation function belongs to one of several suitable broad classes of functions: polygonal functions, certain piecewise polynomial functions, or a class of functions analytic on some open interval.
Abstract: It is shown that feedforward networks having bounded weights are not undesirable restricted, but are in fact universal approximators, provided that the hidden-layer activation function belongs to one of several suitable broad classes of functions: polygonal functions, certain piecewise polynomial functions, or a class of functions analytic on some open interval. These results are obtained by trading bounds on network weights for possible increments to network complexity, as indexed by the number of hidden nodes. The hidden-layer activation functions used include functions not admitted by previous universal approximation results, so the present results also extend the already broad class of activation functions for which universal approximation results are available. A theorem which establishes the approximate ability of these arbitrary mappings to learn when examples are generated by a stationary ergodic process is given

125 citations


Journal ArticleDOI
01 Aug 1990
TL;DR: This paper demystify the multi-layer perceptron network by showing that it just divides the input space into regions constrained by hyperplanes, and uses this information to construct minimal training sets.
Abstract: In this paper we investigate multi-layer perceptron networks in the task domain of Boolean functions. We demystify the multi-layer perceptron network by showing that it just divides the input space into regions constrained by hyperplanes. We use this information to construct minimal training sets. Despite using minimal training sets, the learning time of multi-layer perceptron networks with backpropagation scales exponentially for complex Boolean functions. But modular neural networks which consist of independentky trained subnetworks scale very well. We conjecture that the next generation of neural networks will be genetic neural networks which evolve their structure. We confirm Minsky and Papert: “The future of neural networks is tied not to the search for some single, universal scheme to solve all problems at once, bu to the evolution of a many-faceted technology of network design.”

58 citations


Patent
09 Oct 1990
TL;DR: In this paper, a plurality of neural networks are coupled to an output neural network, or judge network, to form a clustered neural network and the judge network combines the outputs of the plurality of individual neural networks to provide the output from the entire clustered network.
Abstract: A plurality of neural networks are coupled to an output neural network, or judge network, to form a clustered neural network. Each of the plurality of clustered networks comprises a supervised learning rule back-propagated neural network. Each of the clustered neural networks are trained to perform substantially the same mapping function before they are clustered. Following training, the clustered neural network computes its output by taking an "average" of the outputs of the individual neural networks that make up the cluster. The judge network combines the outputs of the plurality of individual neural networks to provide the output from the entire clustered network. In addition, the output of the judge network may be fed back to each of the individual neural networks and used as a training input thereto, in order to provide for continuous training. The use of the clustered network increases the speed of learning and results in better generalization. In addition, clustering multiple back-propagation networks provides for increased performance and fault tolerance when compared to a single unclustered network having substantially the same computational complexity. The present invention may be used in applications that are amenable to neural network solutions, including control and image processing applications. Clustering of the networks also permits the use of smaller networks and provides for improved performance. The clustering of multiple back-propagation networks provides for synergy that improves the properties of the clustered network over a comparably complex non-clustered network.

56 citations


Proceedings ArticleDOI
17 Jun 1990
TL;DR: A complex-valued generalization of neural networks is presented and an activation function with more desirable characteristics in the complex plane is proposed, including the possibility of self oscillation.
Abstract: A complex-valued generalization of neural networks is presented. The dynamics of complex neural networks have parallels in discrete complex dynamics which give rise to the Mandelbrot set and other fractals. The continuation to the complex plane of common activation functions and the resulting neural dynamics are discussed. An activation function with more desirable characteristics in the complex plane is proposed. The dynamics of this activation function include the possibility of self oscillation. Possible applications in signal processing and neurobiological modeling are discussed

44 citations


Proceedings ArticleDOI
17 Jun 1990
TL;DR: A representation theorem is developed for backpropagation neural networks that states that each term in the power series for F(x) is realizable using a building block, and each building block has one hidden layer.
Abstract: A representation theorem is developed for backpropagation neural networks. First, it is assumed that the function to be approximated, F ( x ) for the vector x , is continuous and has finite support, so that it can be approximated arbitrarily well by a multidimensional power series. The activation function, sigmoid or otherwise, is then approximated by a power-series function of the net. Basic building-block subnetworks, realizing the monomial or product of the inputs, are implemented with any desired degree of accuracy. Each term in the power series for F ( x ) is realizable using a building block, and each building block has one hidden layer. Hence, the overall network has one hidden layer

30 citations


Journal ArticleDOI
TL;DR: It is shown here that the transcriptional activation function of LEU3 resides within the C-terminal 32 amino acids, and that an alpha-isopropylmalate-induced conformational change in the central region releases and thus activates the activation domain.

28 citations


Proceedings ArticleDOI
17 Jun 1990
TL;DR: A learning procedure based on back-propagation for obtaining a neural network with discrete weights, under the assumption that the neuron activation function is computed through a lookup table (LUT) and that a LUT can be shared among many neurons.
Abstract: The feasibility of restricting the weight values in multilayer perceptrons to powers of two or sums of powers of two is studied. Multipliers could be thus replaced by shifters and adders on digital hardware, saving both time and chip area, under the assumption that the neuron activation function is computed through a lookup table (LUT) and that a LUT can be shared among many neurons. A learning procedure based on back-propagation for obtaining a neural network with such discrete weights is presented. This learning procedure requires full real arithmetic and therefore must be performed offline. It starts from a multilayer perceptron with continuous weights learned using back-propagation. Then a weight normalization is made to ensure that the whole shifting dynamics is used and to maximize the match between continuous and discrete weights of neurons sharing the same LUT. Finally, a discrete version of BP algorithm with automatic learning rate control is applied up to convergence. Some test runs on a simple pattern recognition problem show the feasibility of the approach

27 citations


Book ChapterDOI
01 Jan 1990
TL;DR: The parameter dependence of the resulting diffusion tensor suggests how for perfectly trainable networks parameters can be made to converge to globally optimal values corresponding to an errorfree implementation of the desired input—output relations.
Abstract: Stochastic pattern presentation induces fluctuations in the weigths of Backpropagation networks, which enable the system to escape from local minima in parameter space. For small learning rates we find that learning is governed by a Fokker-Planck equation. The parameter dependence of the resulting diffusion tensor suggests how for perfectly trainable networks parameters can be made to converge to globally optimal values corresponding to an errorfree implementation of the desired input—output relations. For cases where perfect learning is impossible we demonstrate the usefulness of a simulated annealing-like procedure to reach the minimal error state. We also propose a new activation function which can drastically improve learning as is demonstrated for the parity problem.


Journal Article
TL;DR: This model represents the first computer based network simulation using actual experimental neural data obtained from a large number of spontaneously active cells in a small intact ganglion, and indicates that the networks may synthesize patterns of activity needed for biological function.
Abstract: Techniques are described that allow the use of multiple neuron spike data in a computational neural network architecture. The network architecture was devised to match the number of actual neurons from which data were obtained. The network was successfully trained to accurately predict the multiple neuron spike trains. Simultaneous spike histories of 44 neurons were modeled by a network architecture consisting of 44 input units, 88 hidden units with recurrent connections and 44 output units. The activation function of each unit was determined by data unique to a single neuron. These data were coupled with an analog gradient that preserved both the exact spiking times and the relative spiking tendency of each neuron. The input activation values were compared to network output target values calculated to occur 5 msec forward in the composite spiking records of all neurons. Following 2000 training cycles with the gradient data, the average error of each unit in the network was 0.0016. Discrete output values for each network unit were correlated with those of all other units. These correlations were comparable to those done using the actual neuron data. Both correlations reveal a functional connectivity pattern among the units and neurons. These connectivity patterns indicate that the networks may synthesize patterns of activity needed for biological function; in this case, flight patterns carried out in the mesothoracic ganglion of the dragonfly. This model represents, to the best of our knowledge, the first computer based network simulation using actual experimental neural data obtained from a large number of spontaneously active cells in a small intact ganglion.

Book ChapterDOI
01 Jan 1990
TL;DR: No general principle or guideline available for a synthesis task for multilayer perceptron classification is available, but the methods generally lead to the structure which only deals with a particular classification problem.
Abstract: Designing a multilayer perceptron for general purpose classification has important practical implications. Since the capacity of multilayer perceptron to realize arbitrary dichotomies (or two-class classifications) is limited, the most important step in a design procedure is the determination of the number of the layers and the amount of nodes in each layer apart from the determination of the weights and the threshold values. Unfortunately, there has been no general principle or guideline available for such a synthesis task, normal design often proceeds on an ad hoc and empirical basis, the methods generally lead to the structure which only deals with a particular classification problem [1] [2].

Proceedings ArticleDOI
17 Jun 1990
TL;DR: It is shown that the k hidden units with asymptotic activation function are able to transfer any given k+1 different inputs to linearly independent GHUVs (generated hidden unit vectors) by properly setting weights and thresholds by leading to a scheme for understanding associative memory in the three-layer networks.
Abstract: It is shown that the k hidden units with asymptotic activation function are able to transfer any given k +1 different inputs to linearly independent GHUVs (generated hidden unit vectors) by properly setting weights and thresholds. The number of hidden units with the LIT (linearly independent transformation) capability for the polynomial activation function is limited by the order of polynomials. For analytic asymptotic activation functions and given different inputs, the LIT is a generic capability and a probability 1 capability in setting weights and thresholds randomly. It is a generic and a probability 1 property for any random input if the weight and threshold setting has LIT capability for some k +1 inputs. For three-layer nets with k hidden units, in which the activation function is asymptotic and the output layer is without activation function, they are sufficient to record k +1 arbitrary real samples. It is probability 0 to record k +2 random real samples if the activation is a unit step function. This is true for the sigmoid function in the case of associative memory. These conclusions lead to a scheme for understanding associative memory in the three-layer networks

Book ChapterDOI
Masahiko Arai1
01 Jan 1990
TL;DR: Conditions on an activation function of hidden units for the purpose of utilizing backpropagation for three-layer-net learning are considered and it is discussed that the vectors made from the states and a constant become linearly independent.
Abstract: This paper considers conditions on an activation function of hidden units for the purpose of utilizing backpropagation for three-layer-net learning. A necessary condition for the convergence of backpropagation procedures to a global minimum of a cost function is that a set of states of the hidden layer is linearly separable. A sufficient condition for the separability is that the vectors made from the states and a constant become linearly independent. This paper discusses the conditions that the vectors become linearly independent.

01 Jan 1990
TL;DR: The main conclusion of the dissertation is that the back propagation algorithm can be made more robust by not only making the weights adaptive, but by making the slopes of the nonlinearities adaptive as well.
Abstract: This dissertation is an investigation of the effect of the slope of an activation function (the node nonlinearity) on the performance of the back propagation algorithm in training a multilayer perceptron. When the slope of the activation function is too steep, the input of the nonlinearity often falls in the saturation region. When this occurs, the derivative of the nonlinearity becomes very small resulting in a very small update of the weights. The main conclusion of the dissertation is that the back propagation algorithm can be made more robust by not only making the weights adaptive, but by making the slopes of the nonlinearities adaptive as well. Also a piecewise linear activation function is investigated as a computationally more efficient approximation to the commonly used sigmoid function.

Proceedings ArticleDOI
17 Jun 1990
TL;DR: A general unsupervised learning scheme based on a competitive cost function is presented and the network is able to cluster the data and performs well compared to the dynamic clusters technique, though it fails to make optimal partition of the data for some problems.
Abstract: A general unsupervised learning scheme based on a competitive cost function is presented. A gradient technique is used to minimize the cost function. The algorithm is then applied to clustering problems by using a particular unit activation function. A quadratic potential function is used which permits clustering the data with ellipsoids. Comparisons are made with the dynamic clusters method on artificial data and on R.A. Fisher's (1936) iris data set. Results show that the network is able to cluster the data and performs well compared to the dynamic clusters technique, though it fails to make optimal partition of the data for some problems. Moreover, it automatically finds the number of clusters, contrary to most clustering techniques