scispace - formally typeset
Search or ask a question

Showing papers on "Reinforcement learning published in 1979"


Journal ArticleDOI
TL;DR: The idea of the system is interconnection between the well known reactor control heuristic rules and the reinforcement learning algorithms, and the control signal is proposed as a vector depending on complex physical properties of the plant.

7 citations


Proceedings ArticleDOI
01 Dec 1979
TL;DR: This paper gives an overview of adaptive control methods which were developed based on the concept of active learning for control purposes and some comments on their practicality are given.
Abstract: An important element of adaptive control is learning of the drifting parameters. As the process unfolds, additional information becomes available, which will provide learning for the purpose of control. This information may come about accidentally through past control actions or as a result of active probing, which itself is a possible control policy. Thus learning is present, where it is accidental or deliberate. Since more learning may improve overall control performance, the probing signal may indirectly help in controlling the stochastic system. On the other hand, excessive probing should not be allowed even though it may promote learning because it is expensive in the sense that it will, in general, increase the expected cost performance of the system. A good control law must then regulate its adaptation (learning) in an optimal manner. An adaptive control method is called passively adaptive if learning is not planned in the manner described above; it is called actively adaptive if learning is planned and regulated for the purpose of final control. This paper gives an overview of adaptive control methods which were developed based on the concept of active learning for control purposes. Some comments on their practicality are also given.

5 citations


Journal ArticleDOI
TL;DR: Fitts and Winter as discussed by the authors suggest that the contemporary microeconomic model of the consumer could be adapted to incorporate a more realistic description of ordinary human behavior, and they suggest that behavior which does not correspond to the simplest traditional models is frequently explained, not with these modifications, but instead with propositions of a psychological or quasipsychological character.
Abstract: T HE purpose of this paper is to suggest that the contemporary microeconomic model of the consumer could be adapted to incorporate a more realistic description of ordinary human behavior. For many years economists have tolerated the amalgam of good sense and bad psychology that characterizes microeconomic theory because of the empirical plausibility of its predictions. There is growing pressure to expand the limits of this theory, however, and it is becoming more common for empirical regularities to be explained in terms of variables which never have had a secure place in the theory, or which even may be in open conflict with it. This is exemplified by Houthakker's and Taylor's (1970) use of a consumer's "stock of habits" as a variable in their demand equations, and by Katona's (1960) appeal to low levels of "consumer confidence" in his explanation of the observation that consumer saving rates tend to increase in times of unusual inflation. It is possible that these are merely cases in which economists have taken unconventional lines on situations in which orthodox models still might work were one to introduce such addenda as information costs and lags, expectations formation, transactions costs and the like. Nevertheless, it is intriguing to find that behavior which does not correspond to the simplest traditional models is frequently explained, not with these modifications, but instead with propositions of a psychological or quasipsychological character. There are a great many psychological theories that appear to hold promise for application to economics. Indeed, some cognitive theories of behavior have, at least in appearance, points in common with traditional maximization theory, although they are rarely meant to be taken as literally as economists are prone to take them. In many cases, it is difficult to apply these models to economic problenas (particularly in the case of cognitive models) because there does not seem to be any means for making their exogenous variables endogenous to economic environments. There are other psychological theories for which this difficulty is not evident, however, and the purpose of this paper is to investigate one of the simplest: stochastic learning theory as it would apply to consumer behavior. The work of James Duesenberry (1952) has already brought the relevance of learning processes to the attention of economists concerned with the nature of the consumption function, although his work antedates the development of most of the quantitative models of learning that now exist. An additional impetus for this research comes from compelling demonstrations of the practical relevance of learning theory to experimental economic settings in a series of studies performed by psychologists working in the area of simple reinforcement learning (i.e., "reward"-induced learning), an aspect of behavior which psychologists have subjected to extensive experimental study. One of the remarkable features of these studies is the complete absence of any characterization of the consumer as a rational, maximizing decision-maker, although the experimental results often correspond closely to what would have been expected on the basis of traditional economic theory. For example, in carefully controlled experiments in artificial ("token") economies, Ayllon and Azrin (1966) were able to demonstrate downward-sloping demand curves, upward-sloping supply curves (including the suggestion of a backward-bending supply curve), supply-demand equilibrium prices, and even the purchase of franchises. Often, such results are interpreted as empirical support for the positivistic proposition that individuals behave "as if" they were utility maximizers. It is often the case, however, that learning experiments turn up phenomena that are quite difficult to reconcile with the maximization model. One such example occurs in Ayllon's and Azrin's work on the impact of "advertising" on consumption decisions. Their experimental procedure was to provide "free samples": small quantities of those goods or services that were to be "advertised." For a wide variety of items, the provision of these "free samples" was found to Received for publication August 31, 1977. Revision accepted for publication May 4, 1978. * University of Michigan. The author would like to thank John Fitts, Robert S. Holbrook and Sidney G. Winter for their comments upon earlier drafts of this paper.

5 citations


01 Sep 1979
TL;DR: In this article, a hierarchical structure automaton was developed using a two state stochastic learning automato (SLA) in a time shared model and applied to systems with multidimensional, multimodal performance criteria.
Abstract: A hierarchical structure automaton was developed using a two state stochastic learning automato (SLA) in a time shared model. Application of the hierarchical SLA to systems with multidimensional, multimodal performance criteria is described. Results of experiments performed with the hierarchical SLA using a performance index with a superimposed noise component of ? or - delta distributed uniformly over the surface are discussed.