scispace - formally typeset
Search or ask a question
Journal ArticleDOI

Adaptive properties of differential learning rates for positive and negative outcomes

01 Dec 2013-Biological Cybernetics (Springer Berlin Heidelberg)-Vol. 107, Iss: 6, pp 711-719
TL;DR: It is shown analytically how the optimal learning rate asymmetry depends on the reward distribution and how a biologically plausible algorithm that adapts the balance of positive and negative learning rates from experience is implemented.
Abstract: The concept of the reward prediction error--the difference between reward obtained and reward predicted--continues to be a focal point for much theoretical and experimental work in psychology, cognitive science, and neuroscience Models that rely on reward prediction errors typically assume a single learning rate for positive and negative prediction errors However, behavioral data indicate that better-than-expected and worse-than-expected outcomes often do not have symmetric impacts on learning and decision-making Furthermore, distinct circuits within cortico-striatal loops appear to support learning from positive and negative prediction errors, respectively Such differential learning rates would be expected to lead to biased reward predictions and therefore suboptimal choice performance Contrary to this intuition, we show that on static "bandit" choice tasks, differential learning rates can be adaptive This occurs because asymmetric learning enables a better separation of learned reward probabilities We show analytically how the optimal learning rate asymmetry depends on the reward distribution and implement a biologically plausible algorithm that adapts the balance of positive and negative learning rates from experience These results suggest specific adaptive advantages for separate, differential learning rates in simple reinforcement learning settings and provide a novel, normative perspective on the interpretation of associated neural data
Citations
More filters
Journal ArticleDOI
TL;DR: In this article, a simple instrumental learning task, participants incorporated better than expected outcomes at a higher rate than worse than worse-than-expected ones, and functional imaging indicated that inter-individual difference in the expression of optimistic update corresponds to enhanced prediction error signalling in the reward circuitry.
Abstract: When forming and updating beliefs about future life outcomes, people tend to consider good news and to disregard bad news. This tendency is assumed to support the optimism bias. Whether this learning bias is specific to ‘high-level’ abstract belief update or a particular expression of a more general ‘low-level’ reinforcement learning process is unknown. Here we report evidence in favour of the second hypothesis. In a simple instrumental learning task, participants incorporated better-than-expected outcomes at a higher rate than worse-than-expected ones. In addition, functional imaging indicated that inter-individual difference in the expression of optimistic update corresponds to enhanced prediction error signalling in the reward circuitry. Our results constitute a step towards the understanding of the genesis of optimism bias at the neurocomputational level. Lefebvre et al. present behavioural and neural evidence showing that the ‘optimism bias’ is a manifestation of a general cognitive tendency for preferential learning from positive, compared with negative, outcomes.

197 citations

Journal ArticleDOI
TL;DR: It appears that people tend to preferentially take into account information that confirms their current choice, relative to positive ones, when considering valence-induced bias in the context of both factual and counterfactual learning.
Abstract: Previous studies suggest that factual learning, that is, learning from obtained outcomes, is biased, such that participants preferentially take into account positive, as compared to negative, prediction errors. However, whether or not the prediction error valence also affects counterfactual learning, that is, learning from forgone outcomes, is unknown. To address this question, we analysed the performance of two groups of participants on reinforcement learning tasks using a computational model that was adapted to test if prediction error valence influences learning. We carried out two experiments: in the factual learning experiment, participants learned from partial feedback (i.e., the outcome of the chosen option only); in the counterfactual learning experiment, participants learned from complete feedback information (i.e., the outcomes of both the chosen and unchosen option were displayed). In the factual learning experiment, we replicated previous findings of a valence-induced bias, whereby participants learned preferentially from positive, relative to negative, prediction errors. In contrast, for counterfactual learning, we found the opposite valence-induced bias: negative prediction errors were preferentially taken into account, relative to positive ones. When considering valence-induced bias in the context of both factual and counterfactual learning, it appears that people tend to preferentially take into account information that confirms their current choice.

117 citations

Journal ArticleDOI
TL;DR: Findings indicate that sensitivity to peer approval during adolescence goes beyond simple reinforcement theory accounts and suggest possible explanations for how peers may motivate adolescent behavior.
Abstract: Humans are sophisticated social beings. Social cues from others are exceptionally salient, particularly during adolescence. Understanding how adolescents interpret and learn from variable social signals can provide insight into the observed shift in social sensitivity during this period. The present study tested 120 participants between the ages of 8 and 25 years on a social reinforcement learning task where the probability of receiving positive social feedback was parametrically manipulated. Seventy-eight of these participants completed the task during fMRI scanning. Modeling trial-by-trial learning, children and adults showed higher positive learning rates than did adolescents, suggesting that adolescents demonstrated less differentiation in their reaction times for peers who provided more positive feedback. Forming expectations about receiving positive social reinforcement correlated with neural activity within the medial prefrontal cortex and ventral striatum across age. Adolescents, unlike children and adults, showed greater insular activity during positive prediction error learning and increased activity in the supplementary motor cortex and the putamen when receiving positive social feedback regardless of the expected outcome, suggesting that peer approval may motivate adolescents toward action. While different amounts of positive social reinforcement enhanced learning in children and adults, all positive social reinforcement equally motivated adolescents. Together, these findings indicate that sensitivity to peer approval during adolescence goes beyond simple reinforcement theory accounts and suggest possible explanations for how peers may motivate adolescent behavior.

107 citations

Journal ArticleDOI
TL;DR: The authors found that the learning rate asymmetry was largely insensitive to the average reward rate; instead, the dominant pattern was a higher learning rate for negative than for positive prediction errors, possibly reflecting risk aversion.
Abstract: Studies of reinforcement learning have shown that humans learn differently in response to positive and negative reward prediction errors, a phenomenon that can be captured computationally by positing asymmetric learning rates. This asymmetry, motivated by neurobiological and cognitive considerations, has been invoked to explain learning differences across the lifespan as well as a range of psychiatric disorders. Recent theoretical work, motivated by normative considerations, has hypothesized that the learning rate asymmetry should be modulated by the distribution of rewards across the available options. In particular, the learning rate for negative prediction errors should be higher than the learning rate for positive prediction errors when the average reward rate is high, and this relationship should reverse when the reward rate is low. We tested this hypothesis in a series of experiments. Contrary to the theoretical predictions, we found that the asymmetry was largely insensitive to the average reward rate; instead, the dominant pattern was a higher learning rate for negative than for positive prediction errors, possibly reflecting risk aversion.

92 citations

Journal ArticleDOI
TL;DR: Evidence is found that from childhood to adulthood, individuals become better at optimally weighting recent outcomes during learning across diverse contexts and less exploratory in their value-based decision-making.

89 citations

References
More filters
Book
01 Jan 1988
TL;DR: This book provides a clear and simple account of the key ideas and algorithms of reinforcement learning, which ranges from the history of the field's intellectual foundations to the most recent developments and applications.
Abstract: Reinforcement learning, one of the most active research areas in artificial intelligence, is a computational approach to learning whereby an agent tries to maximize the total amount of reward it receives when interacting with a complex, uncertain environment. In Reinforcement Learning, Richard Sutton and Andrew Barto provide a clear and simple account of the key ideas and algorithms of reinforcement learning. Their discussion ranges from the history of the field's intellectual foundations to the most recent developments and applications. The only necessary mathematical background is familiarity with elementary concepts of probability. The book is divided into three parts. Part I defines the reinforcement learning problem in terms of Markov decision processes. Part II provides basic solution methods: dynamic programming, Monte Carlo methods, and temporal-difference learning. Part III presents a unified view of the solution methods and incorporates artificial neural networks, eligibility traces, and planning; the two final chapters present case studies and consider the future of reinforcement learning.

37,989 citations


"Adaptive properties of differential..." refers background in this paper

  • ...To understand this pattern of results, recall that reinforcement learners face a trade-off between exploitation and exploration (Sutton and Barto 1998): choosing to exploit the option with the highest estimated Q-value does not guarantee that there is not an (insufficiently explored) better option…...

    [...]

  • ...and state values through a common gain factor or learning rate (Sutton and Barto 1998; O’Doherty et al. 2007)....

    [...]

  • ...A central element in the major theories of reinforcement learning is the reward prediction error (RPE) which, in its simplest form, is the difference between the amount of received and expected reward (Sutton and Barto 1998)....

    [...]

  • ...To understand this pattern of results, recall that reinforcement learners face a trade-off between exploitation and exploration (Sutton and Barto 1998): choosing to exploit the option with the highest estimated Q-value does not guarantee that there is not an (insufficiently explored) better option available....

    [...]

  • ...The three agents are tested on two versions of a difficult two-armed bandit task, with large variance in the two outcomes but small differences in the means (Sutton and Barto 1998)....

    [...]

Book ChapterDOI
TL;DR: In this paper, the authors present a critique of expected utility theory as a descriptive model of decision making under risk, and develop an alternative model, called prospect theory, in which value is assigned to gains and losses rather than to final assets and in which probabilities are replaced by decision weights.
Abstract: This paper presents a critique of expected utility theory as a descriptive model of decision making under risk, and develops an alternative model, called prospect theory. Choices among risky prospects exhibit several pervasive effects that are inconsistent with the basic tenets of utility theory. In particular, people underweight outcomes that are merely probable in comparison with outcomes that are obtained with certainty. This tendency, called the certainty effect, contributes to risk aversion in choices involving sure gains and to risk seeking in choices involving sure losses. In addition, people generally discard components that are shared by all prospects under consideration. This tendency, called the isolation effect, leads to inconsistent preferences when the same choice is presented in different forms. An alternative theory of choice is developed, in which value is assigned to gains and losses rather than to final assets and in which probabilities are replaced by decision weights. The value function is normally concave for gains, commonly convex for losses, and is generally steeper for losses than for gains. Decision weights are generally lower than the corresponding probabilities, except in the range of low prob- abilities. Overweighting of low probabilities may contribute to the attractiveness of both insurance and gambling. EXPECTED UTILITY THEORY has dominated the analysis of decision making under risk. It has been generally accepted as a normative model of rational choice (24), and widely applied as a descriptive model of economic behavior, e.g. (15, 4). Thus, it is assumed that all reasonable people would wish to obey the axioms of the theory (47, 36), and that most people actually do, most of the time. The present paper describes several classes of choice problems in which preferences systematically violate the axioms of expected utility theory. In the light of these observations we argue that utility theory, as it is commonly interpreted and applied, is not an adequate descriptive model and we propose an alternative account of choice under risk. 2. CRITIQUE

35,067 citations

Book
01 Mar 1998
TL;DR: In Reinforcement Learning, Richard Sutton and Andrew Barto provide a clear and simple account of the key ideas and algorithms of reinforcement learning.
Abstract: From the Publisher: In Reinforcement Learning, Richard Sutton and Andrew Barto provide a clear and simple account of the key ideas and algorithms of reinforcement learning. Their discussion ranges from the history of the field's intellectual foundations to the most recent developments and applications. The only necessary mathematical background is familiarity with elementary concepts of probability.

7,016 citations


"Adaptive properties of differential..." refers background in this paper

  • ...The three agents are tested on two versions of a difficult two-armed bandit task, with large variance in the two outcomes but small differences in the means (Sutton and Barto 1998)....

    [...]

  • ...A central element in the major theories of reinforcement learning is the reward prediction error (RPE) which, in its simplest form, is the difference between the amount of received and expected reward (Sutton and Barto 1998)....

    [...]

  • ...To understand this pattern of results, recall that reinforcement learners face a trade-off between exploitation and exploration (Sutton and Barto 1998): choosing to exploit the option with the highest estimated Q-value does not guarantee that there is not an (insufficiently explored) better option…...

    [...]

  • ...Many reinforcement learning models implicitly assume that positive and negative RPEs impact estimation of action and state values through a common gain factor or learning rate (Sutton and Barto 1998; O’Doherty et al. 2007)....

    [...]

01 Jan 1989

4,916 citations


"Adaptive properties of differential..." refers background or methods in this paper

  • ...Unlike the standard form of typical reinforcement learning algorithms such as Q-learning (Watkins 1989), however, our agents employ different learning rates for positive and negative prediction errors ΔQt = (rt − Qt ): Qt+1 = Qt + { α+ΔQt if ΔQt ≥ 0 α−ΔQt if ΔQt < 0 Qt , by convention, corresponds to the expected reward of taking a certain action at time step t (“Q-value”). rt is the actual reward received at time step t ....

    [...]

  • ...Q-learning and related models are in widespread use as models for fitting behavior and neurophysiological data in neuroscience, providing much evidence that biological learning mechanisms share at least some properties with these models (O’Doherty et al. 2007; Maia and Frank 2011, but see Gershman and Niv 2010)....

    [...]

  • ...dard form of typical reinforcement learning algorithms such as Q-learning (Watkins 1989), however, our agents employ different learning rates for positive and negative prediction errors ΔQt = (rt − Qt ):...

    [...]

  • ...The analytical results predicted with high accuracy the behavior of a modified Q-learning algorithm (5,000 iterations of 800 trials each; error bars in Fig....

    [...]

  • ...In particular, we simulate the performance of three Q-learning agents on two different “two-armed bandit” problems....

    [...]

Journal ArticleDOI
07 Dec 1990-Science
TL;DR: The differential effects of dopamine on striatonigral and striatopallidal neurons are mediated by their specific expression of D1 and D2 dopamine receptor subtypes, respectively.
Abstract: The striatum, which is the major component of the basal ganglia in the brain, is regulated in part by dopaminergic input from the substantia nigra. Severe movement disorders result from the loss of striatal dopamine in patients with Parkinson's disease. Rats with lesions of the nigrostriatal dopamine pathway caused by 6-hydroxydopamine (6-OHDA) serve as a model for Parkinson's disease and show alterations in gene expression in the two major output systems of the striatum to the globus pallidus and substantia nigra. Striatopallidal neurons show a 6-OHDA-induced elevation in their specific expression of messenger RNAs (mRNAs) encoding the D2 dopamine receptor and enkephalin, which is reversed by subsequent continuous treatment with the D2 agonist quinpirole. Conversely, striatonigral neurons show a 6-OHDA-induced reduction in their specific expression of mRNAs encoding the D1 dopamine receptor and substance P, which is reversed by subsequent daily injections of the D1 agonist SKF-38393. This treatment also increases dynorphin mRNA in striatonigral neurons. Thus, the differential effects of dopamine on striatonigral and striatopallidal neurons are mediated by their specific expression of D1 and D2 dopamine receptor subtypes, respectively.

2,946 citations


"Adaptive properties of differential..." refers background in this paper

  • ...pathways in the basal ganglia, associated with predominantly D1- and D2-expressing projection neurons in the striatum, respectively (Gerfen et al. 1990; Kravitz et al. 2012)....

    [...]

  • ...An influential proposal arising idea is that RPEs may have different effects on direct (D1) and indirect (D2) pathways in the striatum (Gerfen et al. 1990; Kravitz et al. 2012)....

    [...]

  • ...Depending on the specific setting, these distinct mechanisms are referred to as approach/avoid, go/noGo, or direct/indirect pathways in the basal ganglia, associated with predominantly D1- and D2-expressing projection neurons in the striatum, respectively (Gerfen et al. 1990; Kravitz et al. 2012)....

    [...]