scispace - formally typeset
Search or ask a question

Showing papers on "Reinforcement learning published in 1989"


Proceedings Article
01 Jan 1989
TL;DR: This paper discusses reinforcement learning in terms of the sequential decision framework and shows how a learning algorithm similar to the one implemented by the Adaptive Critic Element used in the pole-balancer of Barto, Sutton, and Anderson (1983), and further developed by Sutton (1984), fits into this framework.
Abstract: Decision making tasks that involve delayed consequences are very common yet difficult to address with supervised learning methods If there is an accurate model of the underlying dynamical system, then these tasks can be formulated as sequential decision problems and solved by Dynamic Programming This paper discusses reinforcement learning in terms of the sequential decision framework and shows how a learning algorithm similar to the one implemented by the Adaptive Critic Element used in the pole-balancer of Barto, Sutton, and Anderson (1983), and further developed by Sutton (1984), fits into this framework Adaptive neural networks can play significant roles as modules for approximating the functions required for solving sequential decision problems

63 citations


Proceedings Article
01 Jan 1989
TL;DR: This paper describes a neural network algorithm called complementary reinforcement back-propagation (CRBP), and reports simulation results on problems designed to offer differing opportunities for generalization.
Abstract: In associative reinforcement learning, an environment generates input vectors, a learning system generates possible output vectors, and a reinforcement function computes feedback signals from the input-output pairs. The task is to discover and remember input-output pairs that generate rewards. Especially difficult cases occur when rewards are rare, since the expected time for any algorithm can grow exponentially with the size of the problem. Nonetheless, if a reinforcement function possesses regularities, and a learning algorithm exploits them, learning time can be reduced below that of non-generalizing algorithms. This paper describes a neural network algorithm called complementary reinforcement back-propagation (CRBP), and reports simulation results on problems designed to offer differing opportunities for generalization.

61 citations


Proceedings ArticleDOI
25 Sep 1989
TL;DR: A novel method for designing AI (artificial intelligence)-based controllers using approximate reasoning and reinforcement learning using linguistic control rules obtained from human expert controllers and a form of reinforcement learning related to the temporal difference method is introduced.
Abstract: The authors introduce a novel method for designing AI (artificial intelligence)-based controllers using approximate reasoning and reinforcement learning. The approach uses linguistic control rules obtained from human expert controllers and a form of reinforcement learning related to the temporal difference method. A major characteristic of the proposed system is its ability to use past experience with an incompletely known system to predict its future behavior. The proposed method is applied in the context of a cart-pole balancing problem. The present approach learns to balance a pole within 15 trials (within 10 trials in most cases) and outperforms the previously developed schemes for this problem such as A.G. Barto et al.'s (1983) method or D. Michie and R.A. Chambers' (1968) work in the BOXES system. >

60 citations


Proceedings ArticleDOI
Williams1, Peng1
01 Jan 1989
TL;DR: The results of simulations in which the optima of several deterministic functions studied by D.H. Ackley were sought using variants of REINFORCE algorithms compare favorably to the best results found by Ackley.
Abstract: Any nonassociative reinforcement learning algorithm can be viewed as a method for performing function optimization through (possibly noise-corrupted) sampling of function values. A description is given of the results of simulations in which the optima of several deterministic functions studied by D.H. Ackley (Ph.D. Diss., Carnegie-Mellon Univ., 1987) were sought using variants of REINFORCE algorithms. Results obtained for certain of these algorithms compare favorably to the best results found by Ackley. >

19 citations


Journal ArticleDOI
TL;DR: The basic operation of biological and electronic (artificial) neural networks (NNs) is examined, and applications of neural-style learning chips to pattern recognition, data compression, optimization, and expert systems are discussed.
Abstract: The basic operation of biological and electronic (artificial) neural networks (NNs) is examined. Learning by NNs is discussed, covering supervised learning, particularly back-propagation, and unsupervised and reinforcement learning. The use of VLSI implementation to speed learning is considered briefly. Applications of neural-style learning chips to pattern recognition, data compression, optimization, and expert systems is discussed. Problem areas and issues for further research are addressed. >

19 citations


Journal ArticleDOI
TL;DR: Three commonly used principles in neural-network design (associative learning, competition, and opponent processing) are outlined here, and two examples of their use in behavior-modeling architectures are discussed.
Abstract: Neural networks are an increasingly important tool for the mechanistic understanding ofpsy­ chological phenomena. Three commonly used principles in neural-network design (associative learning, competition, and opponent processing) are outlined here, and two examples of their use in behavior-modeling architectures are discussed. One example relates to an instance of reinforce­ ment learning; that is, of an organism controlling its environment to maximize positive rein­ forcement or to minimize negative reinforcement. The other example relates to some characteristic deviations from reinforcement learning that occur in people or monkeys with frontal-lobe damage.

16 citations


Journal ArticleDOI
TL;DR: This paper reviews the state of the art in machine learning and provides a glimpse of the pioneers of present machine-learning systems and strategies.
Abstract: Machine learning is the essence of machine intelligence. When we have systems that learn, we will have true artificial intelligence. Many machine-learning strategies exist, this paper reviews the state of the art in machine learning and provides a glimpse of the pioneers of present machine-learning systems and strategies. Learning in noisy domains, the evolutionary learning, learning by analogy and explanation-based learning are just some of the methods covered. Emphasis is placed on the algorithms employed by many of the systems, and the merits and disadvantages of various approaches. Finally an examination of VanLehn's theory of impasse-driven learning is made.

13 citations


Proceedings ArticleDOI
25 Sep 1989
TL;DR: An approach to eliminating the quantization of the input space is described and a new input space representation consists of functions that act as receptive fields and have the shape of multivariate Gaussian probability density functions; they are the first layer in the learning network.
Abstract: A learning control approach called refinement, in which a fixed controller is first designed using analytic design tools is explored. This controller's performance is refined by a secondary learning controller, which is a reinforcement learning-based connectionist network. The issue is the representation of the input space of the refinement learning controller. In previous work, the input space was quantized into fixed boxes and each box became a control situation for the learning controller. The drawback was that the learning control designer had to know how to quantize the space. An approach to eliminating the quantization of the input space is described. The new input space representation consists of functions that act as receptive fields and have the shape of multivariate Gaussian probability density functions; they are the first layer in the learning network. Experiments used a tracking control problem with an additive nonlinearity. The learning controller adds an appropriate control signal on the basis of a given evaluation function, in order to improve the fixed controller's ability to track a reference signal. >

12 citations


01 Jan 1989
TL;DR: This thesis proposes several techniques for handling dimensionality and nonlinearity in supervised learning and in reinforcement learning, and proposes a class of supervised learning networks called context-sensitive networks and a general framework for associative reinforcement learning.
Abstract: Most connectionist approaches, while promising in solving small-sized problems, do not scale up well with the problem size. However, many real-world learning problems are of high dimensionality and the mappings to be learned are also highly nonlinear. Learning may prove to be intractable if the methods are scaled up naively. This thesis proposes several techniques for handling dimensionality and nonlinearity in supervised learning and in reinforcement learning. The underlying philosophy is based on two principles: divide-and-conquer and locality. We propose a class of supervised learning networks called context-sensitive networks. The basic idea is to decompose a function into a parameterized family of functions, each of which is lower in dimensionality and hence easier to learn than the original one. A context-sensitive network, composed of a context network and a function network, has two semantically different levels of abstraction. With the use of complex hidden units in a context network, sparsely distributed internal representations will emerge and the problem of high nonlinearity can be handled better. Each hidden unity is only sensitive to a localized basis region in the input space. This helps reduce the interference between different patterns represented in a distributed fashion. The a priori knowledge of forming convex basis regions is utilized in grouping simple hidden units into complex ones. Thus the network does not start from scratch but with simple internal representations already existing. We then propose a general framework for associative reinforcement learning, which includes a supervised learning network as one of its building components. With this framework, the techniques for handling dimensionality and nonlinearity is supervised learning can be transferred to reinforcement learning problems. We also propose a game-theoretic network architecture for associative reinforcement learning. Each subproblem is taken care of by one subnetwork called an associative learning automaton. The game-theoretic interactions among the associative learning automata lead to the emergence of the solution for the entire problem. Extensive simulations have been run for several control-type problems to test and illustrate the ideas. Among them, the robot arm control problem and the adaptive load balancing problem are the most extensively studied ones. (Copies available exclusively from Micrographics Department, Doheny Library, USC, Los Angeles, CA 90089-0182.)

6 citations


Proceedings ArticleDOI
25 Sep 1989
TL;DR: The adaptive layer in the control hierarchy is developed based on two fundamental properties concerning the grouping of outcomes which must be satisfied if a control policy exists for the process in terms of the defined neighborhoods.
Abstract: Several mathematical structures are proposed for learning control systems using a wide array of techniques from a variety of disciplines. Reinforcement learning offers the greatest degree of flexibility in utilizing process information in conjunction with concepts from optical control while maintaining the basic constructs used in mathematical learning theory at the direct control layer. The adaptive layer in the control hierarchy is developed based on two fundamental properties concerning the grouping of outcomes which must be satisfied if a control policy exists for the process in terms of the defined neighborhoods. The original control objective can be interpreted in light of these two objectives and a control policy will be synthesized once these conditions are satisfied by all neighborhoods constructed during the process of learning. >

5 citations


Proceedings ArticleDOI
M.D. Peek1, Panos J. Antsaklis
25 Sep 1989
TL;DR: A parameter learning method is introduced which is used to broaden the region of operability of the adaptive control system of a space structure and determines the best parameter values to use when given different disturbances.
Abstract: A parameter learning method is introduced which is used to broaden the region of operability of the adaptive control system of a space structure. The learning system guides the selection of control parameters in a process, leading to optimal system performance; the method is a form of learning by observation and discovery. It is applicable to any system where performance depends on a number of adjustable parameters. A mathematical model is not necessary as the learning system can be used whenever the performance can be measured via simulation or experiment. The results of a transient regulation experiment are presented. In this experiment, the learning system determines the best parameter values to use when given different disturbances. >

Proceedings Article
16 Oct 1989
TL;DR: This paper considers reinforcement learning neural networks using associative reward/penalty elements and the use of stochastic computing techniques for the hardware synthesis of neural networks.
Abstract: This paper considers reinforcement learning neural networks using associative reward/penalty elements. Fundamental theory and applications of relevant stochastic learning automata are reviewed followed by a discussion of associative reward/penalty structures and recent work on reinforcement neural networks. The paper concludes with a consideration of the use of stochastic computing techniques for the hardware synthesis of neural networks. >

30 Nov 1989
TL;DR: Progress was made in the development of reinforcement learning methods for control of dynamical systems, and a generalized theory of supervised learning was developed, in which training information comes in the form of constraints instead of specifications of desired network outputs.
Abstract: : This report describes progress made in the development of connectionist learning methods permitting networks to learn when they cannot be provided with training information of the high quality required by supervised- learning methods. These methods can permit the application of adaptive connectionist networks to tasks involving complex dynamical behavior and high degrees of uncertainty. A method for training layered networks to perform nonlinear pattern recognition and associative memory tasks was refined. The neuron-like units making up these networks learn on the basis of feedback that evaluates behavior but does not specify desired output or directly provide error information. We report how this method is related to gradient-following methods, how its learning rate can be improved, and argue that this method is biologically plausible. A generalized theory of supervised learning was developed, in which training information comes in the form of constraints instead of specifications of desired network outputs. This approach was illustrated by using it to train a simulated multi-jointed manipulator to perform sequences of reaching tasks. Progress was made in the development of reinforcement learning methods for control of dynamical systems. Keywords: Adaptive networks, Neural computing, Stochastic learning automata, Cooperative computing, Artificial intelligence.

Journal ArticleDOI
TL;DR: Several programmed instructional modules for self‐paced and/or reinforced learning in pest management were utilized at the Pan American School of Agriculture and the University of Guayaquil within existing courses of plant protection, and significant differences did occur among modules and among modes of presentation.
Abstract: Several programmed instructional modules (for self‐paced and/or reinforced learning) in pest management were utilized at the Pan American School of Agriculture (Honduras) and the University of Guayaquil (Ecuador) within existing courses of plant protection. Three modes of delivery were employed: autotutorial, conventional lecture, and conventional lecture + slide illustration. Student performance, as measured by the percentage differences between pre‐and post‐module examination, were determined, and significant differences did occur among modules and among modes of presentation. Up to a 41% drop in the standard error of post‐instruction (v. pre‐instruction) test scores occurred, indicating our modules had a ‘standardizing’ effect on student users. The utility of programmed instructional learning in the developing world and on the need to tailor ‘mode of delivery’ to the subject being taught are discussed.

Proceedings ArticleDOI
Jannarone1, Lii1, Ma1, Wei1
01 Jan 1989
TL;DR: Recent conjunctoid results are outlined, including a new formulation that treats supervised and unsupervised learning-as well as reinforced learning, associative memory and optimization-as special cases and advances in conditional maximum-likelihood estimation toward consistent, noniterative, and highly parallel learning trial updating.
Abstract: Summary form only given, as follows. The main early results from conjunctoid learning theory are reviewed. Recent conjunctoid results are outlined, including: (1) a new formulation that treats supervised and unsupervised learning-as well as reinforced learning, associative memory and optimization-as special cases; (2) advances in conditional maximum-likelihood estimation toward consistent, noniterative, and highly parallel learning trial updating; (3) advances in VLSI implementation and Monte-Carlo simulation; and (4) the resulting applications prospects. >

Proceedings ArticleDOI
14 Nov 1989
TL;DR: The mechanism presented can be defined as an action probability updating rule and thus a variable-structure stochastic automaton and forms an excellent model for an epsilon -optimal stubbornly learning system.
Abstract: The authors consider the problem of a learning mechanism learning the optimal action offered by a random environment. The mechanism presented can be defined as an action probability updating rule and thus a variable-structure stochastic automaton. The machine is essentially a stubborn machine; in other words, once the machine has chosen a particular action it increases the probability of choosing the action irrespective of whether the response from the environment was favorable or unfavorable. However, this increase in the action probability is done in a systematic and methodical way so that the machine learns, in an epsilon -optimal fashion, the best action which the environment offers. The proposed mechanism forms an excellent model for an epsilon -optimal stubbornly learning system. Apart from the fact that the machine is shown to be epsilon -optimal, a major contribution of the present work is that the mathematical tools used in this proof (namely the theory of distributions, kernels, and topological spaces) are quite distinct from those which are currently used in the field of learning. Also presented are simulation results which demonstrate the properties of the mechanism and which compare it to the traditional L/sub RI/ scheme. >