scispace - formally typeset
Search or ask a question
Journal ArticleDOI

Learning to represent reward structure: a key to adapting to complex environments.

01 Dec 2012-Neuroscience Research (Elsevier)-Vol. 74, Iss: 3, pp 177-183
TL;DR: This work proposes a new hypothesis - the dopamine reward structural learning hypothesis - in which dopamine activity encodes multiplex signals for learning in order to represent reward structure in the internal state, leading to better reward prediction.
About: This article is published in Neuroscience Research.The article was published on 2012-12-01 and is currently open access. It has received 30 citations till now. The article focuses on the topics: Reinforcement learning.
Citations
More filters
Journal ArticleDOI
TL;DR: A new theory is presented showing how learning to learn may arise from interactions between prefrontal cortex and the dopamine system, providing a fresh foundation for future research.
Abstract: Over the past 20 years, neuroscience research on reward-based learning has converged on a canonical model, under which the neurotransmitter dopamine ‘stamps in’ associations between situations, actions and rewards by modulating the strength of synaptic connections between neurons. However, a growing number of recent findings have placed this standard model under strain. We now draw on recent advances in artificial intelligence to introduce a new theory of reward-based learning. Here, the dopamine system trains another part of the brain, the prefrontal cortex, to operate as its own free-standing learning system. This new perspective accommodates the findings that motivated the standard model, but also deals gracefully with a wider range of observations, providing a fresh foundation for future research.

441 citations

Journal ArticleDOI
TL;DR: Using computational modeling, it is proposed that internally generated sequences may be productively considered a component of goal-directed decision systems, implementing a sampling-based inference engine that optimizes goal acquisition at multiple timescales of on-line choice, action control, and learning.

196 citations

Journal ArticleDOI
TL;DR: At points of unpredictability, midbrain and striatal regions associated with the phasic release of the neurotransmitter dopamine transiently increased in activity, which could provide a global updating signal, cuing other brain systems that a significant new event has begun.
Abstract: Predicting the near future is important for survival and plays a central role in theories of perception, language processing, and learning. Prediction failures may be particularly important for initiating the updating of perceptual and memory systems and, thus, for the subjective experience of events. Here, we asked observers to make predictions about what would happen 5 sec later in a movie of an everyday activity. Those points where prediction was more difficult corresponded with subjective boundaries in the stream of experience. At points of unpredictability, midbrain and striatal regions associated with the phasic release of the neurotransmitter dopamine transiently increased in activity. This activity could provide a global updating signal, cuing other brain systems that a significant new event has begun.

155 citations

Journal ArticleDOI
TL;DR: The use of reward feedback is a promising approach to either supplement or substitute sensory feedback in the development of improved neurorehabilitation techniques and points to an important role played by reward in the motor learning process.
Abstract: Recent findings have demonstrated that reward feedback alone can drive motor learning. However, it is not yet clear whether reward feedback alone can lead to learning when a perturbation is introdu...

132 citations


Cites background from "Learning to represent reward struct..."

  • ...However, the community has recently begun to appreciate the importance of the underlying reward structure (Nakahara and Hikosaka 2012)....

    [...]

Journal ArticleDOI
TL;DR: A new theory of dopamine function is developed that embraces a broader conceptualization of prediction errors and indicates that by signalling errors in both sensory and reward predictions, dopamine supports a form of RL that lies between model-based and model-free algorithms.
Abstract: Midbrain dopamine neurons are commonly thought to report a reward prediction error (RPE), as hypothesized by reinforcement learning (RL) theory. While this theory has been highly successful, severa...

113 citations

References
More filters
Book
01 Jan 1988
TL;DR: This book provides a clear and simple account of the key ideas and algorithms of reinforcement learning, which ranges from the history of the field's intellectual foundations to the most recent developments and applications.
Abstract: Reinforcement learning, one of the most active research areas in artificial intelligence, is a computational approach to learning whereby an agent tries to maximize the total amount of reward it receives when interacting with a complex, uncertain environment. In Reinforcement Learning, Richard Sutton and Andrew Barto provide a clear and simple account of the key ideas and algorithms of reinforcement learning. Their discussion ranges from the history of the field's intellectual foundations to the most recent developments and applications. The only necessary mathematical background is familiarity with elementary concepts of probability. The book is divided into three parts. Part I defines the reinforcement learning problem in terms of Markov decision processes. Part II provides basic solution methods: dynamic programming, Monte Carlo methods, and temporal-difference learning. Part III presents a unified view of the solution methods and incorporates artificial neural networks, eligibility traces, and planning; the two final chapters present case studies and consider the future of reinforcement learning.

37,989 citations


"Learning to represent reward struct..." refers background in this paper

  • ...007 research (Sutton and Barto, 1990) and remains an active research area in computer science and machine learning (Sutton and Barto, 1998)....

    [...]

  • ...64 65 66 67 168-0102/$ – see front matter © 2012 Elsevier Ireland Ltd and the Japan Neuroscience S ttp://dx.doi.org/10.1016/j.neures.2012.09.007 research (Sutton and Barto, 1990) and remains an active research area in computer science and machine learning (Sutton and Barto, 1998)....

    [...]

Journal ArticleDOI
14 Mar 1997-Science
TL;DR: Findings in this work indicate that dopaminergic neurons in the primate whose fluctuating output apparently signals changes or errors in the predictions of future salient and rewarding events can be understood through quantitative theories of adaptive optimizing control.
Abstract: The capacity to predict future events permits a creature to detect, model, and manipulate the causal structure of its interactions with its environment. Behavioral experiments suggest that learning is driven by changes in the expectations about future salient events such as rewards and punishments. Physiological work has recently complemented these studies by identifying dopaminergic neurons in the primate whose fluctuating output apparently signals changes or errors in the predictions of future salient and rewarding events. Taken together, these findings can be understood through quantitative theories of adaptive optimizing control.

8,163 citations


"Learning to represent reward struct..." refers background in this paper

  • ...neur he value-based decision making process and the underlying neual mechanisms (Montague et al., 1996; Schultz et al., 1997)....

    [...]

  • ...A marked example is an ingenious hypothesis about dopamine phasic activity as a learning signal for TD learning (called TD error), which is the strongest example of mapping to date, and is thus a critical driving force behind the progress in this field (Barto, 1994; Houk et al., 1994; Montague et al., 1996; Schultz et al., 1997)....

    [...]

  • ...…hypothesis about dopamine phasic activity as a learning signal for TD learning (called TD error), which is the strongest example of mapping to date, and is thus a critical driving force behind the progress in this field (Barto, 1994; Houk et al., 1994; Montague et al., 1996; Schultz et al., 1997)....

    [...]

  • ...This transparent mapping has helped to drive the field’s progress since the proposal of this hypothesis, and it has been observed as the correspondence between “canonical” DA responses and the TD error of the hypothesis (Schultz et al., 1997)....

    [...]

  • ...(2012), http://dx.doi.org/10.1016/j.neur he value-based decision making process and the underlying neual mechanisms (Montague et al., 1996; Schultz et al., 1997)....

    [...]

Journal ArticleDOI
TL;DR: Dopamine systems may have two functions, the phasic transmission of reward information and the tonic enabling of postsynaptic neurons.
Abstract: Schultz, Wolfram. Predictive reward signal of dopamine neurons. J. Neurophysiol. 80: 1–27, 1998. The effects of lesions, receptor blocking, electrical self-stimulation, and drugs of abuse suggest t...

3,962 citations

Journal ArticleDOI
TL;DR: This work considers dual-action choice systems from a normative perspective, and suggests a Bayesian principle of arbitration between them according to uncertainty, so each controller is deployed when it should be most accurate.
Abstract: A broad range of neural and behavioral data suggests that the brain contains multiple systems for behavioral choice, including one associated with prefrontal cortex and another with dorsolateral striatum. However, such a surfeit of control raises an additional choice problem: how to arbitrate between the systems when they disagree. Here, we consider dual-action choice systems from a normative perspective, using the computational theory of reinforcement learning. We identify a key trade-off pitting computational simplicity against the flexible and statistically efficient use of experience. The trade-off is realized in a competition between the dorsolateral striatal and prefrontal systems. We suggest a Bayesian principle of arbitration between them according to uncertainty, so each controller is deployed when it should be most accurate. This provides a unifying account of a wealth of experimental evidence about the factors favoring dominance by either system.

2,171 citations

Journal ArticleDOI
21 Mar 2003-Science
TL;DR: Using distinct stimuli to indicate the probability of reward, it was found that the phasic activation of dopamine neurons varied monotonically across the full range of probabilities, supporting past claims that this response codes the discrepancy between predicted and actual reward.
Abstract: Uncertainty is critical in the measure of information and in assessing the accuracy of predictions. It is determined by probability P, being maximal at P = 0.5 and decreasing at higher and lower probabilities. Using distinct stimuli to indicate the probability of reward, we found that the phasic activation of dopamine neurons varied monotonically across the full range of probabilities, supporting past claims that this response codes the discrepancy between predicted and actual reward. In contrast, a previously unobserved response covaried with uncertainty and consisted of a gradual increase in activity until the potential time of reward. The coding of uncertainty suggests a possible role for dopamine signals in attention-based learning and risk-taking behavior.

1,950 citations


"Learning to represent reward struct..." refers background in this paper

  • ...Indeed, DA activity is also shown to encode “uncertainty” signals (Fiorillo et al., 2003) or “information-seeking” signals (Bromberg-Martin and Hikosaka, 2009)....

    [...]