scispace - formally typeset
Search or ask a question

Showing papers on "Reinforcement learning published in 1988"


Book
01 Jan 1988
TL;DR: This book provides a clear and simple account of the key ideas and algorithms of reinforcement learning, which ranges from the history of the field's intellectual foundations to the most recent developments and applications.
Abstract: Reinforcement learning, one of the most active research areas in artificial intelligence, is a computational approach to learning whereby an agent tries to maximize the total amount of reward it receives when interacting with a complex, uncertain environment. In Reinforcement Learning, Richard Sutton and Andrew Barto provide a clear and simple account of the key ideas and algorithms of reinforcement learning. Their discussion ranges from the history of the field's intellectual foundations to the most recent developments and applications. The only necessary mathematical background is familiarity with elementary concepts of probability. The book is divided into three parts. Part I defines the reinforcement learning problem in terms of Markov decision processes. Part II provides basic solution methods: dynamic programming, Monte Carlo methods, and temporal-difference learning. Part III presents a unified view of the solution methods and incorporates artificial neural networks, eligibility traces, and planning; the two final chapters present case studies and consider the future of reinforcement learning.

37,989 citations


Proceedings ArticleDOI
Williams1
24 Jul 1988
TL;DR: A description is given of several ways that backpropagation can be useful in training networks to perform associative reinforcement learning tasks and it is observed that such an approach even permits a seamless blend of associatives reinforcement learning and supervised learning within the same network.
Abstract: A description is given of several ways that backpropagation can be useful in training networks to perform associative reinforcement learning tasks. One way is to train a second network to model the environmental reinforcement signal and to backpropagate through this network into the first network. This technique has been proposed and explored previously in various forms. Another way is based on the use of the reinforce algorithm and amounts to backpropagating through deterministic parts of the network while performing a correlation-style computation where the behavior is stochastic. A third way, which is an extension of the second, allows backpropagation through the stochastic parts of the network as well. The mathematical validity of this third technique rests on the use of continuous-valued stochastic units. Some implications of this result for using supervised learning to train networks of stochastic units are noted, and it is also observed that such an approach even permits a seamless blend of associative reinforcement learning and supervised learning within the same network. >

72 citations


Book ChapterDOI
01 Jan 1988
TL;DR: A promising new method which combines induction with reinforcement learning is described, which produces a set of control rules which are fast, reliable and, most importantly, more readable than the parameters and weights which constitute the knowledge of a pure reinforcement system.
Abstract: This paper reports on experiments performed with a variety of algorithms that have been used for the task of learning to control dynamic systems. It compares their speed, reliability and the assumptions about the problem domain which must be made in order for them to work. We describe a promising new method which combines induction with reinforcement learning. The output of this method is a set of control rules which are fast, reliable and, most importantly, more readable than the parameters and weights which constitute the knowledge of a pure reinforcement system. Finally, some open questions are presented.

36 citations


Proceedings ArticleDOI
07 Dec 1988
TL;DR: An extension of earlier work in the refinement of robotic motor control using reinforcement learning is described, no longerAssuming that the magnitude of the state-dependent nonlinear torque is known, the learning controller learns about not only the presence of the torque, but also its magnitude.
Abstract: An extension of earlier work in the refinement of robotic motor control using reinforcement learning is described. It is no longer assumed that the magnitude of the state-dependent nonlinear torque is known. The learning controller learns about not only the presence of the torque, but also its magnitude. The ability of the learning system to learn this real-valued mapping from output feedback and reference input to control signal is facilitated by a stochastic algorithm that uses reinforcement feedback. A learning controller that can learn nonlinear mappings holds many possibilities for extending existing adaptive control research. >

35 citations


Journal ArticleDOI
TL;DR: The significance of a learning general theory is explained and the relation between learning and reinforcement and the transference of learning is studied and organized learning, with memory accumulation, and closed control is defined.
Abstract: We explain the significance of a learning general theory. We work in a resolution informational level that is defined on a population. The populational magnitude changes through the creation or destruction of individual systems; we express this creation or destruction by means of the reproducibility, which produces a selective discrimination on the population. We define the reinforcement as far as a goal variable is concerned and separate the reinforcement by selective discrimination and the reinforcement by a change of the conditional probability. We also define memory accumulation and reinforcement on individual systems and study The effectiveness and the regularity of reinforcement in relation to the goal. We define the information on a variable and its learning; we study the relation between learning and reinforcement and the transference of learning. We define the controlled system and its information and study its learning. We define organized learning, with memory accumulation, and closed control. ...

12 citations


Proceedings ArticleDOI
15 Jun 1988
TL;DR: This paper provides a framework for a class of methods to solve the adaptive load balancing problem in flexible manufacturing systems by implementing both the associative reinforcement learning and the constraint satisfaction modules by connectionist networks.
Abstract: This paper provides a framework for a class of methods to solve the adaptive load balancing problem in flexible manufacturing systems. The control system is composed of a group of associative learning automata which interact with each other in a game-theoretic sense. Each automaton makes use of a global reinforcement signal for learning the control strategy under different state input. The control actions suggested by the automata interact through a constraint satisfaction network to give a globally legal set of control actions. Using existing techniques in neural network research, we propose one particular method of the class by implementing both the associative reinforcement learning and the constraint satisfaction modules by connectionist networks. Comparisons of this method with other related studies will be discussed. We expect our current simulation work to provide empirical support for future analytical study.

7 citations


30 Sep 1988
TL;DR: A network of units designed to be implemented as a unit in a Connectionist network is presented that learns the inVERSE KINEMATIC TRANSFORM of a SIMULATED 3 DEGREE-based system.
Abstract: REINFORCEMENT LEARNING IS THE PROCESS BY WHICH THE PROBABILITY OF THE RESPONSE OF A SYSTEM TO A STIMULUS INCREASES WITH REWARD AND DECREASES WITH PUNISHMENT [19]. MOST OF THE RESEARCH IN REINFORCEMENT LEARNING (WITH THE EXCEPTION OF THE WORK IN FUNCTION OPTIMIZATION) HAS BEEN ON PROBLEMS WITH DISCRETE ACTION SPACES, IN WHICH THE LEARNING SYSTEM CHOOSES ONE OF A FIN- ITE NUMBER OF POSSIBLE ACTIONS. HOWEVER, MANY CONTROL PROBLEMS REQUIRE THE APPLICATION OF CONTINUOUS CONTROL SIGNALS. IN THIS PAPER, WE PRESENT A STO- CHASTIC REINFORCEMENT LEARNING ALGORITHM FOR LEARNING FUNCTIONS WITH CON- TINOUS OUTPUTS. OUR ALGORITHM IS DESIGNED TO BE IMPLEMENTED AS A UNIT IN A CONNECTIONIST NETWORK. WE ASSUME THAT THE LEARNING SYSTEM COMPUTES ITS REAL -VALUED OUTPUT AS SOME FUNCTION OF A RANDOM ACTIVATION GENERATED USING THE NORMAL DISTRIBUTION. THE ACTIVATION AT ANY TIME DEPENDS ON THE TWO PARAME- TERS, THE MEAN AND THE STANDARD DEVIATION, USED IN THE NORMAL DISTRIBUTION, WHICH, IN TURN, DEPEND ON THE CURRENT INPUTS TO THE UNIT. LEARNING TAKES PLACE BY USING OUR ALGORITHM TO ADJUST THESE TWO PARAMETERS SO AS TO IN- CREASE THE PROBABILITY OF PRODUCING THE OPTIMAL REAL VALUE FOR EACH INPUT PATTERN. THE PERFORMANCE OF THE ALGORITHM IS STUDIED BY USING IT TO LEARN TASKS OF VARYING LEVELS OF DIFFICULTY. FURTHER, AS AN EXAMPLE OF A POTEN- TIAL APPLICATION, WE PRESENT A NETWORK INCORPORATING THESE REAL-VALUED UNITS THAT LEARNS THE INVERSE KINEMATIC TRANSFORM OF A SIMULATED 3 DEGREE-

7 citations


Book ChapterDOI
01 Jan 1988
TL;DR: This work has developed an algorithm and architecture for a connectionist system which mimics unsupervised competitive learning in function; however in form it resembles a reinforcement learning scheme, and suggests that this type of algorithm may be important in the building of large-scale networks for learning complex tasks.
Abstract: A crucial problem for connectionist learning schemes is how to partition the space of input vectors into useful categories for subsequent processing. Currently this partitioning is usually accomplished by unsupervised competitive learning algorithms. Although these schemes are simple and fast, they are unable to deal with categorizations that depend on factors other than the vectors' superficial similarity. Specifically they do not take into account feedback (or reinforcement) from outside the system as to the appropriateness of the categorizations that are being learned. We have developed an algorithm and architecture for a connectionist system which mimics unsupervised competitive learning in function; however in form it resembles a reinforcement learning scheme. We call this algorithm competitive reinforcement. This algorithm is inherently more stable than traditional competitive learning paradigms and can be easily and naturally adapted to function in reinforcement learning networks, allowing feature detection to be guided by externally generated reinforcement. A demonstration of the algorithm and its features using the classic dipole stimulus is presented. We suggest that this type of algorithm may be important in the building of large-scale networks for learning complex tasks.

5 citations



01 Aug 1988
TL;DR: The goal of this panel was to analyze the interactions between Machine Learning and Intelligent Control.
Abstract: Machine Learning was established as a research discipline in the 1970's and experienced a growth expansion in the 1980's. One of the roots of machine learning research was in Cybernetic Systems and Adaptive Control. Machine Learning has been significantly influenced by Artificial Intelligence, Cognitive Science, Computer Science, and other disciplines; Machine Learning has developed its own research paradigms, methodologies, and a set of research objectives different from those of control systems. In the meantime, a new field of Intelligent Control has emerged. Even though Intelligent Control adheres more closely to the traditional control systems theory paradigms - mainly quantitative descriptions, differential equations models, goal-oriented system design, rigid mathematical formulation of goals and models - it has also deviated from the traditional systems theory approach. The two fields have moved forward without much interaction between them - different conferences, different journals, different researchers. Machine Learning has been concerned primarily with general learning mechanisms and methodologies for software implementation of learning systems. Intelligent Control has concentrated more on the dynamics of real physical systems and practical control problem solving. Because the two disciplines have at least me goal in common - automatic acquisition of knowledge about the world - they should have more interaction. The lack of interdisciplinary communication may lead to some undesirable results: establishing different terminologies for the same phenomena, repetitive work (discovering the same things independently), and lower quality research (ignoring the results established by the other discipline). The goal of this panel was to analyze the interactions between Machine Learning and Intelligent Control. The panel consisted of several researchers both from the area of Intelligent Control and from Machine Learning. The panelists were asked to concentrate on such general issues

2 citations


Proceedings ArticleDOI
24 Aug 1988
TL;DR: The use of knowledge-based techniques in feedback control, including fuzzy linguistic control, qualitative causal control, and procedural control, as well as reinforcement learning and induction, are discussed.
Abstract: The use of knowledge-based techniques in feedback control is reviewed. Fuzzy linguistic control, qualitative causal control, and procedural control, as well as reinforcement learning and induction, are discussed. A brief comparison of these techniques in terms of some key characteristics is presented. >

Proceedings ArticleDOI
07 Dec 1988
TL;DR: The goal is to develop a theoretical approach which would provide a methodology for automatic construction of such controllers which are able to solve the above control problem.
Abstract: The authors deal with a control problem when the control objective is to keep the system within some qualitative region called safe. It is assumed that the system is evolving over time, i.e. it is changing its qualitative and quantitative characteristics. This is reflected in the qualitative change of the model of the system which, as a result, must be permanently updated. The goal is to develop a theoretical approach which would provide a methodology for automatic construction of such controllers which are able to solve the above control problem. An approach to the solution of this problem is presented that is based on a combination of control theory and artificial intelligence. >

01 Jan 1988
TL;DR: The approach is compared to both Michie and Chambers BOXES algorithml, to the extension by Barto, et aI.
Abstract: This research investigates a new technique for unsupervised learning of nonlinear control problems. The approach is applied both to Michie and Chambers BOXES algorithm and to Barto, Sutton and Anderson's extension, the ASE/ACE system, and has significantly improved the convergence rate of stochastically based learning automata. Recurrence learning is a new nonlinear reward-penalty algorithm. It exploits information found during learning trials to reinforce decisions resulting in the recurrence of nonfailing states. Recurrence learning applies positive reinforcement during the exploration of the search space, whereas in the BOXES or ASE algorithms, only negative weight reinforcement is applied, and then only on failure. Simulation results show that the added information from recurrence learning increases the learning rate. Our empirical results show that recurrence learning is faster than both basic failure driven learning and failure prediction methods. Although recurrence learning has only been tested in failure driven experiments, there are goal directed learning applications where detection of recurring oscillations may provide useful information that reduces the learning time by applying negative, instead of positive reinforcement. Detection of cycles provides a heuristic to improve the balance between evidence gathering and goal directed search. INTRODUCflON This research investigates a new technique for unsupervised learning of nonlinear con trol problems with delayed feedback. Our approach is compared to both Michie and Chambers BOXES algorithml, to the extension by Barto, et aI., the ASE (Adaptive Search Element) and to their ASE/ACE (Adaptive Critic Element) system2, and shows an improved learning time for stochastically based learning automata in failure driven tasks. We consider adaptively controlling the behavior of a system which passes through a sequence of states due to its internal dynamics (which are not assumed to be known a priori) and due to choices of actions made in visited states. Such an adaptive controller is often referred to as a learning automaton. The decisions can be deterministic or can be made according to a stochastic rule. A learning automaton has to discover which action is best in each circumstance by producing actions and observing the resulting information. This paper was motivated by the previous work of Barto, et al. to investigate neuronlike adaptive elements that affect and learn from their environment. We were inspired by their current work and the recent attention to neural networks and connectionist systems, and have chosen to use the cart-pole control problem2, to enable a comparison of our results with theirs .