Search or ask a question

Showing papers on "Reinforcement learning published in 1987"

PDF

Open Access

Proceedings Article•

Stochastic Learning Networks and their Electronic Implementation

[...]

Joshua Alspector¹, Robert B. Allen¹, Victor Hu², Srinagesh Satyanarayana³•Institutions (3)

Telcordia Technologies¹, University of California, Berkeley², Columbia University³

01 Jan 1987

TL;DR: A family of learning algorithms that operate on a recurrent, symmetrically connected, neuromorphic network that, like the Boltzmann machine, settles in the presence of noise and a version of the supervised learning algorithm for a network with analog activation functions.

...read moreread less

Abstract: We describe a family of learning algorithms that operate on a recurrent, symmetrically connected, neuromorphic network that, like the Boltzmann machine, settles in the presence of noise. These networks learn by modifying synaptic connection strengths on the basis of correlations seen locally by each synapse. We describe a version of the supervised learning algorithm for a network with analog activation functions. We also demonstrate unsupervised competitive learning with this approach, where weight saturation and decay play an important role, and describe preliminary experiments in reinforcement learning, where noise is used in the search procedure. We identify the above described phenomena as elements that can unify learning techniques at a physical microscopic level. These algorithms were chosen for ease of implementation in vlsi. We have designed a CMOS test chip in 2 micron rules that can speed up the learning about a millionfold over an equivalent simulation on a VAX 11/780. The speedup is due to parallel analog computation for summing and multiplying weights and activations, and the use of physical processes for generating random noise. The components of the test chip are a noise amplifier, a neuron amplifier, and a 300 transistor adaptive synapse, each of which is separately testable. These components are also integrated into a 6 neuron and 15 synapse network. Finally, we point out techniques for reducing the area of the electronic correlational synapse both in technology and design and show how the algorithms we study can be implemented naturally in electronic systems.

...read moreread less

55 citations

Proceedings Article•

Learning by state recurrence detection

[...]

Bruce E. Rosen¹, James M. Goodwin¹, Jacques J. Vidal¹•Institutions (1)

University of California, Los Angeles¹

01 Jan 1987

TL;DR: This research investigates a new technique for unsupervised learning of nonlinear control problems, applied both to Michie and Chambers BOXES algorithm and to Barto, Sutton and Anderson's extension, the ASE/ACE system, and has significantly improved the convergence rate of stochastically based learning automata.

...read moreread less

Abstract: This research investigates a new technique for unsupervised learning of nonlinear control problems. The approach is applied both to Michie and Chambers BOXES algorithm and to Barto, Sutton and Anderson's extension, the ASE/ACE system, and has significantly improved the convergence rate of stochastically based learning automata. Recurrence learning is a new nonlinear reward-penalty algorithm. It exploits information found during learning trials to reinforce decisions resulting in the recurrence of nonfailing states. Recurrence learning applies positive reinforcement during the exploration of the search space, whereas in the BOXES or ASE algorithms, only negative weight reinforcement is applied, and then only on failure. Simulation results show that the added information from recurrence learning increases the learning rate. Our empirical results show that recurrence learning is faster than both basic failure driven learning and failure prediction methods. Although recurrence learning has only been tested in failure driven experiments, there are goal directed learning applications where detection of recurring oscillations may provide useful information that reduces the learning time by applying negative, instead of positive reinforcement. Detection of cycles provides a heuristic to improve the balance between evidence gathering and goal directed search.

...read moreread less

9 citations

Game-theoretic cooperativity in networks of self-interested units

[...]

Andrew G. Barto

01 Mar 1987

TL;DR: In this paper, the authors present an approach to network learning that is related to game and team problems in which competition and cooperation have more technical meanings, and the adaptive element is a synthesis of aspects of stochastic learning automata and typical neuron-like adaptive elements.

...read moreread less

Abstract: The behavior of theoretical neural networks is often described in terms of competition and cooperation. I present an approach to network learning that is related to game and team problems in which competition and cooperation have more technical meanings. I briefly describe the application of stochastic learning automata to game and team problems and then present an adaptive element that is a synthesis of aspects of stochastic learning automata and typical neuron‐like adaptive elements. These elements act as self‐interested agents that work toward improving their performance with respect to their individual preference orderings. Networks of these elements can solve a variety of team decision problems, some of which take the form of layered networks in which the ‘‘hidden units’’ become appropriate functional components as they attempt to improve their own payoffs.

...read moreread less

9 citations