scispace - formally typeset
Search or ask a question
Journal ArticleDOI

A learning rule of neural networks via simultaneous perturbation and its hardware implementation

01 Feb 1995-Neural Networks (Pergamon)-Vol. 8, Iss: 2, pp 251-259
TL;DR: A learning rule of neural networks via a simultaneous perturbation and an analog feedforward neural network circuit using the learning rule, which requires only forward operations of the neural network and is suitable for hardware implementation.
About: This article is published in Neural Networks.The article was published on 1995-02-01 and is currently open access. It has received 106 citations till now. The article focuses on the topics: Learning rule & Competitive learning.

Summary (1 min read)

1. INTRODUCTION

  • The authors show some computer simulations of the proposed learning rule and a comparison between this learning rule, back-propagation method, and a learning rule using the simple perturbation.
  • In addition, the authors fabricated an analog neural network circuit using the learning rule.
  • The authors describe details of the circuit and show some results by the circuit.

PERTURBATION

  • The authors use the following nomenclature in this paper.
  • A subscript represents the cell number in the layer.
  • On the basis of this estimated first-differential coefficient, the authors can modify the weight vector as with the back-propagation method.
  • On the other hand, the authors add the perturbation to all weights simultaneously.

3. SIMULATION RESULTS

  • The authors could obtain a neural network learning the 26 characters by using their learning rule.
  • Even in this case, the authors require only twice forward operations of the network to obtain the modifying quantities corresponding to all weights.

4.1. Neuron Unit

  • In a learning mode, all weight parts in the unit update the weight values in parallel by using the quantity delivered from the learning unit and the sign of the perturbation held in each D-FF.
  • Therefore, concurrent modifications of all weights are possible.

5.1. The Exclusive-OR Problem

  • Potentially, their leaning rule contains an ability to pass through the local minimum.
  • The modifying quantity defined in eqn (3) consists of the first-differential coefficient and an another error as described in eqn (6).
  • Thus, the larger c is, the larger the error is, and vice versa if c is smaller.
  • From this point of view, the learning rule has a property like the simulated annealing.
  • The authors need detailed discussions, analysis and experiences for this point.

5.2. The TCLX Problem

  • Figure 15 shows the teaching signals and the observed outputs for this problem.
  • This figure shows that the neural network circuit learns the TCLX patterns.
  • Also in this problem, the modifications of all weights are performed for a period T/4.
  • The operation speed in this figure is approximately 6.0.

Did you find this useful? Give us your feedback

Citations
More filters
Journal ArticleDOI
TL;DR: This paper presents a simple step-by-step guide to implementation of SPSA in generic optimization problems and offers some practical suggestions for choosing certain algorithm coefficients.
Abstract: The need for solving multivariate optimization problems is pervasive in engineering and the physical and social sciences. The simultaneous perturbation stochastic approximation (SPSA) algorithm has recently attracted considerable attention for challenging optimization problems where it is difficult or impossible to directly obtain a gradient of the objective function with respect to the parameters being optimized. SPSA is based on an easily implemented and highly efficient gradient approximation that relies on measurements of the objective function, not on measurements of the gradient of the objective function. The gradient approximation is based on only two function measurements (regardless of the dimension of the gradient vector). This contrasts with standard finite-difference approaches, which require a number of function measurements proportional to the dimension of the gradient vector. This paper presents a simple step-by-step guide to implementation of SPSA in generic optimization problems and offers some practical suggestions for choosing certain algorithm coefficients.

759 citations

Journal ArticleDOI
TL;DR: This article presents a comprehensive overview of the hardware realizations of artificial neural network models, known as hardware neural networks (HNN), appearing in academic studies as prototypes as well as in commercial use.

638 citations

Posted Content
TL;DR: An exhaustive review of the research conducted in neuromorphic computing since the inception of the term is provided to motivate further work by illuminating gaps in the field where new research is needed.
Abstract: Neuromorphic computing has come to refer to a variety of brain-inspired computers, devices, and models that contrast the pervasive von Neumann computer architecture This biologically inspired approach has created highly connected synthetic neurons and synapses that can be used to model neuroscience theories as well as solve challenging machine learning problems The promise of the technology is to create a brain-like ability to learn and adapt, but the technical challenges are significant, starting with an accurate neuroscience model of how the brain works, to finding materials and engineering breakthroughs to build devices to support these models, to creating a programming framework so the systems can learn, to creating applications with brain-like capabilities In this work, we provide a comprehensive survey of the research and motivations for neuromorphic computing over its history We begin with a 35-year review of the motivations and drivers of neuromorphic computing, then look at the major research areas of the field, which we define as neuro-inspired models, algorithms and learning approaches, hardware and devices, supporting systems, and finally applications We conclude with a broad discussion on the major research topics that need to be addressed in the coming years to see the promise of neuromorphic computing fulfilled The goals of this work are to provide an exhaustive review of the research conducted in neuromorphic computing since the inception of the term, and to motivate further work by illuminating gaps in the field where new research is needed

570 citations


Additional excerpts

  • ...[647], [655], [670], [679], [682], [693], [698], [699], [702]–...

    [...]

  • ...[646], [648]–[653], [655]–[664], [666]–[683], [685]–[699],...

    [...]

  • ...[655], [669], [682], [698], [699], [708], [710], [712], [713],...

    [...]

01 Jan 1999
TL;DR: Simultaneous perturbation stochastic approximation (SPSA) as mentioned in this paper is a widely used method for multivariate optimization problems that requires only two measurements of the objective function regardless of the dimension of the optimization problem.
Abstract: ultivariate stochastic optimization plays a major role in the analysis and control of many engineering systems. In almost all real-world optimization problems, it is necessary to use a mathematical algorithm that iteratively seeks out the solution because an analytical (closed-form) solution is rarely available. In this spirit, the “simultaneous perturbation stochastic approximation (SPSA)” method for difficult multivariate optimization problems has been developed. SPSA has recently attracted considerable international attention in areas such as statistical parameter estimation, feedback control, simulation-based optimization, signal and image processing, and experimental design. The essential feature of SPSA—which accounts for its power and relative ease of implementation—is the underlying gradient approximation that requires only two measurements of the objective function regardless of the dimension of the optimization problem. This feature allows for a significant decrease in the cost of optimization, especially in problems with a large number of variables to be optimized. (

378 citations

Journal ArticleDOI
TL;DR: The use of memristor bridge synapse in the proposed architecture solves one of the major problems, regarding nonvolatile weight storage in analog neural network implementations, and a modified chip-in-the-loop learning scheme suitable for the proposed neural network architecture is proposed.
Abstract: Analog hardware architecture of a memristor bridge synapse-based multilayer neural network and its learning scheme is proposed. The use of memristor bridge synapse in the proposed architecture solves one of the major problems, regarding nonvolatile weight storage in analog neural network implementations. To compensate for the spatial nonuniformity and nonideal response of the memristor bridge synapse, a modified chip-in-the-loop learning scheme suitable for the proposed neural network architecture is also proposed. In the proposed method, the initial learning is conducted in software, and the behavior of the software-trained network is learned by the hardware network by learning each of the single-layered neurons of the network independently. The forward calculation of the single-layered neuron learning is implemented on circuit hardware, and followed by a weight updating phase assisted by a host computer. Unlike conventional chip-in-the-loop learning, the need for the readout of synaptic weights for calculating weight updates in each epoch is eliminated by virtue of the memristor bridge synapse and the proposed learning scheme. The hardware architecture along with the successful implementation of proposed learning on a three-bit parity network, and on a car detection network is also presented.

314 citations


Cites background from "A learning rule of neural networks ..."

  • ...This is a relatively complex operation to be implemented in electronic circuits, and the complications are amplified by imperfections and mismatch in the circuit components....

    [...]

References
More filters
Journal ArticleDOI
TL;DR: The paper presents an SA algorithm that is based on a simultaneous perturbation gradient approximation instead of the standard finite-difference approximation of Keifer-Wolfowitz type procedures that can be significantly more efficient than the standard algorithms in large-dimensional problems.
Abstract: The problem of finding a root of the multivariate gradient equation that arises in function minimization is considered. When only noisy measurements of the function are available, a stochastic approximation (SA) algorithm for the general Kiefer-Wolfowitz type is appropriate for estimating the root. The paper presents an SA algorithm that is based on a simultaneous perturbation gradient approximation instead of the standard finite-difference approximation of Keifer-Wolfowitz type procedures. Theory and numerical experience indicate that the algorithm can be significantly more efficient than the standard algorithms in large-dimensional problems. >

2,149 citations


Additional excerpts

  • ...Kiefer-Wolfowitz stochastic approximation method (Spall, 1992)....

    [...]

Journal ArticleDOI
TL;DR: The back-propagation algorithm described by Rumelhart et al. (1986) can greatly accelerate convergence as discussed by the authors, however, in many applications, the number of iterations required before convergence can be large.
Abstract: The utility of the back-propagation method in establishing suitable weights in a distributed adaptive network has been demonstrated repeatedly. Unfortunately, in many applications, the number of iterations required before convergence can be large. Modifications to the back-propagation algorithm described by Rumelhart et al. (1986) can greatly accelerate convergence. The modifications consist of three changes:1) instead of updating the network weights after each pattern is presented to the network, the network is updated only after the entire repertoire of patterns to be learned has been presented to the network, at which time the algebraic sums of all the weight changes are applied:2) instead of keeping ź, the "learning rate" (i.e., the multiplier on the step size) constant, it is varied dynamically so that the algorithm utilizes a near-optimum ź, as determined by the local optimization topography; and3) the momentum factor ź is set to zero when, as signified by a failure of a step to reduce the total error, the information inherent in prior steps is more likely to be misleading than beneficial. Only after the network takes a useful step, i.e., one that reduces the total error, does ź again assume a non-zero value. Considering the selection of weights in neural nets as a problem in classical nonlinear optimization theory, the rationale for algorithms seeking only those weights that produce the globally minimum error is reviewed and rejected.

1,017 citations

Book
13 Oct 2011
TL;DR: A Neural Processor for Maze Solving and Issues in Analog VLSI and MOS Techniques for Neural Computing are discussed.
Abstract: 1. A Neural Processor for Maze Solving.- 2 Resistive Fuses: Analog Hardware for Detecting Discontinuities in Early Vision.- 3 CMOS Integration of Herault-Jutten Cells for Separation of Sources.- 4 Circuit Models of Sensory Transduction in the Cochlea.- 5 Issues in Analog VLSI and MOS Techniques for Neural Computing.- 6 Design and Fabrication of VLSI Components for a General Purpose Analog Neural Computer.- 7 A Chip that Focuses an Image on Itself.- 8 A Foveated Retina-Like Sensor Using CCD Technology.- 9 Cooperative Stereo Matching Using Static and Dynamic Image Features.- 10 Adaptive Retina.

359 citations


"A learning rule of neural networks ..." refers methods in this paper

  • ...Nowadays, we can implement artificial neural networks using several media (De Gloria, 1989; Mead & Ismail, 1989)....

    [...]

Journal ArticleDOI
TL;DR: It is shown that using gradient descent with direct approximation of the gradient instead of back-propagation is more economical for parallel analog implementations and is suitable for multilayer recurrent networks as well.
Abstract: Previous work on analog VLSI implementation of multilayer perceptrons with on-chip learning has mainly targeted the implementation of algorithms such as back-propagation. Although back-propagation is efficient, its implementation in analog VLSI requires excessive computational hardware. It is shown that using gradient descent with direct approximation of the gradient instead of back-propagation is more economical for parallel analog implementations. It is shown that this technique (which is called 'weight perturbation') is suitable for multilayer recurrent networks as well. A discrete level analog implementation showing the training of an XOR network as an example is presented. >

264 citations


"A learning rule of neural networks ..." refers background or methods in this paper

  • ...Independently, the authors also proposed and fabricated an analog neural network circuit using the same learning rule (Maeda, Yamashita, & Kanata, 1991 ) and investigated a usefulness of this type of learning rule in an inverse problem (Maeda, 1992). However, as pointed out in Maeda, Yamashita, and Kanata (1991), the learning rule using the simple perturbation requires n-times ~ forward operations of the neural network for one modification of all weights....

    [...]

  • ...Usually, the learning rule of neural networks via a simple sequential parameter perturbation was proposed and a hardware implementation was reported (Jabri & Flower, 1992)....

    [...]

Proceedings Article
30 Nov 1992
TL;DR: A parallel stochastic algorithm is investigated for error-descent learning and optimization in deterministic networks of arbitrary topology based on the model-free distributed learning mechanism of Dembo and Kailath and supported by a modified parameter update rule.
Abstract: A parallel stochastic algorithm is investigated for error-descent learning and optimization in deterministic networks of arbitrary topology. No explicit information about internal network structure is needed. The method is based on the model-free distributed learning mechanism of Dembo and Kailath. A modified parameter update rule is proposed by which each individual parameter vector perturbation contributes a decrease in error. A substantially faster learning speed is hence allowed. Furthermore, the modified algorithm supports learning time-varying features in dynamical networks. We analyze the convergence and scaling properties of the algorithm, and present simulation results for dynamic trajectory learning in recurrent networks.

163 citations