Learning to represent reward structure: a key to adapting to complex environments.

doi:10.1016/J.NEURES.2012.09.007

Home
/
Papers
/
Learning to represent reward structure: a key to adapting to complex environments.

Journal Article•DOI•

Learning to represent reward structure: a key to adapting to complex environments.

Hiroyuki Nakahara¹, Okihide Hikosaka²•Institutions (2)

RIKEN Brain Science Institute¹, National Institutes of Health²

01 Dec 2012-Neuroscience Research (Elsevier)-Vol. 74, Iss: 3, pp 177-183

TL;DR: This work proposes a new hypothesis - the dopamine reward structural learning hypothesis - in which dopamine activity encodes multiplex signals for learning in order to represent reward structure in the internal state, leading to better reward prediction.

read less

About: This article is published in Neuroscience Research.The article was published on 2012-12-01 and is currently open access. It has received 30 citations till now. The article focuses on the topics: Reinforcement learning.

...read moreread less

Citations

PDF

Open Access

More filters

Journal Article•DOI•

Prefrontal cortex as a meta-reinforcement learning system

[...]

Jane X. Wang, Zeb Kurth-Nelson¹, Dharshan Kumaran¹, Dhruva Tirumala, Hubert Soyer, Joel Z. Leibo, Demis Hassabis¹, Matthew Botvinick¹ - Show less +4 more•Institutions (1)

University College London¹

14 May 2018-Nature Neuroscience

TL;DR: A new theory is presented showing how learning to learn may arise from interactions between prefrontal cortex and the dopamine system, providing a fresh foundation for future research.

...read moreread less

Abstract: Over the past 20 years, neuroscience research on reward-based learning has converged on a canonical model, under which the neurotransmitter dopamine ‘stamps in’ associations between situations, actions and rewards by modulating the strength of synaptic connections between neurons. However, a growing number of recent findings have placed this standard model under strain. We now draw on recent advances in artificial intelligence to introduce a new theory of reward-based learning. Here, the dopamine system trains another part of the brain, the prefrontal cortex, to operate as its own free-standing learning system. This new perspective accommodates the findings that motivated the standard model, but also deals gracefully with a wider range of observations, providing a fresh foundation for future research.

...read moreread less

441 citations

Journal Article•DOI•

Internally generated sequences in learning and executing goal-directed behavior

[...]

Giovanni Pezzulo¹, Matthijs A. A. van der Meer², Carien S. Lansink³, Cyriel M. A. Pennartz³•Institutions (3)

National Research Council¹, University of Waterloo², University of Amsterdam³

01 Dec 2014-Trends in Cognitive Sciences

TL;DR: Using computational modeling, it is proposed that internally generated sequences may be productively considered a component of goal-directed decision systems, implementing a sampling-based inference engine that optimizes goal acquisition at multiple timescales of on-line choice, action control, and learning.

...read moreread less

196 citations

Journal Article•DOI•

Prediction error associated with the perceptual segmentation of naturalistic events

[...]

Jeffrey M. Zacks¹, Christopher A. Kurby², Christopher A. Kurby¹, Michelle L. Eisenberg¹, Nayiri Haroutunian¹ - Show less +1 more•Institutions (2)

Washington University in St. Louis¹, Grand Valley State University²

01 Dec 2011-Journal of Cognitive Neuroscience

TL;DR: At points of unpredictability, midbrain and striatal regions associated with the phasic release of the neurotransmitter dopamine transiently increased in activity, which could provide a global updating signal, cuing other brain systems that a significant new event has begun.

...read moreread less

Abstract: Predicting the near future is important for survival and plays a central role in theories of perception, language processing, and learning. Prediction failures may be particularly important for initiating the updating of perceptual and memory systems and, thus, for the subjective experience of events. Here, we asked observers to make predictions about what would happen 5 sec later in a movie of an everyday activity. Those points where prediction was more difficult corresponded with subjective boundaries in the stream of experience. At points of unpredictability, midbrain and striatal regions associated with the phasic release of the neurotransmitter dopamine transiently increased in activity. This activity could provide a global updating signal, cuing other brain systems that a significant new event has begun.

...read moreread less

155 citations

Journal Article•DOI•

Reward feedback accelerates motor learning

[...]

A.A. Nikooyan¹, Alaa A. Ahmed¹•Institutions (1)

University of Colorado Boulder¹

15 Jan 2015-Journal of Neurophysiology

TL;DR: The use of reward feedback is a promising approach to either supplement or substitute sensory feedback in the development of improved neurorehabilitation techniques and points to an important role played by reward in the motor learning process.

...read moreread less

Abstract: Recent findings have demonstrated that reward feedback alone can drive motor learning. However, it is not yet clear whether reward feedback alone can lead to learning when a perturbation is introdu...

...read moreread less

132 citations

Cites background from "Learning to represent reward struct..."

...However, the community has recently begun to appreciate the importance of the underlying reward structure (Nakahara and Hikosaka 2012)....
[...]

Journal Article•DOI•

Rethinking dopamine as generalized prediction error.

[...]

Matthew P.H. Gardner¹, Geoffrey Schoenbaum², Geoffrey Schoenbaum³, Geoffrey Schoenbaum¹, Samuel J. Gershman⁴ - Show less +1 more•Institutions (4)

National Institute on Drug Abuse¹, Johns Hopkins University², University of Maryland, Baltimore³, Harvard University⁴

21 Nov 2018-Proceedings of The Royal Society B: Biological Sciences

TL;DR: A new theory of dopamine function is developed that embraces a broader conceptualization of prediction errors and indicates that by signalling errors in both sensory and reward predictions, dopamine supports a form of RL that lies between model-based and model-free algorithms.

...read moreread less

Abstract: Midbrain dopamine neurons are commonly thought to report a reward prediction error (RPE), as hypothesized by reinforcement learning (RL) theory. While this theory has been highly successful, severa...

...read moreread less

113 citations

1
2
3
4
…
5
6

Collapse

References

PDF

Open Access

More filters

Book•

Reinforcement Learning: An Introduction

[...]

Richard S. Sutton¹, Andrew G. Barto•Institutions (1)

Massachusetts Institute of Technology¹

01 Jan 1988

TL;DR: This book provides a clear and simple account of the key ideas and algorithms of reinforcement learning, which ranges from the history of the field's intellectual foundations to the most recent developments and applications.

...read moreread less

Abstract: Reinforcement learning, one of the most active research areas in artificial intelligence, is a computational approach to learning whereby an agent tries to maximize the total amount of reward it receives when interacting with a complex, uncertain environment. In Reinforcement Learning, Richard Sutton and Andrew Barto provide a clear and simple account of the key ideas and algorithms of reinforcement learning. Their discussion ranges from the history of the field's intellectual foundations to the most recent developments and applications. The only necessary mathematical background is familiarity with elementary concepts of probability. The book is divided into three parts. Part I defines the reinforcement learning problem in terms of Markov decision processes. Part II provides basic solution methods: dynamic programming, Monte Carlo methods, and temporal-difference learning. Part III presents a unified view of the solution methods and incorporates artificial neural networks, eligibility traces, and planning; the two final chapters present case studies and consider the future of reinforcement learning.

...read moreread less

37,989 citations

"Learning to represent reward struct..." refers background in this paper

...007 research (Sutton and Barto, 1990) and remains an active research area in computer science and machine learning (Sutton and Barto, 1998)....
[...]
...64 65 66 67 168-0102/$ – see front matter © 2012 Elsevier Ireland Ltd and the Japan Neuroscience S ttp://dx.doi.org/10.1016/j.neures.2012.09.007 research (Sutton and Barto, 1990) and remains an active research area in computer science and machine learning (Sutton and Barto, 1998)....
[...]

Journal Article•DOI•

A Neural Substrate of Prediction and Reward

[...]

Wolfram Schultz¹, Peter Dayan², P R Montague³•Institutions (3)

University of Fribourg¹, Massachusetts Institute of Technology², Baylor College of Medicine³

14 Mar 1997-Science

TL;DR: Findings in this work indicate that dopaminergic neurons in the primate whose fluctuating output apparently signals changes or errors in the predictions of future salient and rewarding events can be understood through quantitative theories of adaptive optimizing control.

...read moreread less

Abstract: The capacity to predict future events permits a creature to detect, model, and manipulate the causal structure of its interactions with its environment. Behavioral experiments suggest that learning is driven by changes in the expectations about future salient events such as rewards and punishments. Physiological work has recently complemented these studies by identifying dopaminergic neurons in the primate whose fluctuating output apparently signals changes or errors in the predictions of future salient and rewarding events. Taken together, these findings can be understood through quantitative theories of adaptive optimizing control.

...read moreread less

8,163 citations

"Learning to represent reward struct..." refers background in this paper

...neur he value-based decision making process and the underlying neual mechanisms (Montague et al., 1996; Schultz et al., 1997)....
[...]
...A marked example is an ingenious hypothesis about dopamine phasic activity as a learning signal for TD learning (called TD error), which is the strongest example of mapping to date, and is thus a critical driving force behind the progress in this field (Barto, 1994; Houk et al., 1994; Montague et al., 1996; Schultz et al., 1997)....
[...]
...…hypothesis about dopamine phasic activity as a learning signal for TD learning (called TD error), which is the strongest example of mapping to date, and is thus a critical driving force behind the progress in this field (Barto, 1994; Houk et al., 1994; Montague et al., 1996; Schultz et al., 1997)....
[...]
...This transparent mapping has helped to drive the field’s progress since the proposal of this hypothesis, and it has been observed as the correspondence between “canonical” DA responses and the TD error of the hypothesis (Schultz et al., 1997)....
[...]
...(2012), http://dx.doi.org/10.1016/j.neur he value-based decision making process and the underlying neual mechanisms (Montague et al., 1996; Schultz et al., 1997)....
[...]

Journal Article•DOI•

Predictive Reward Signal of Dopamine Neurons

[...]

Wolfram Schultz¹•Institutions (1)

University of Fribourg¹

01 Jul 1998-Journal of Neurophysiology

TL;DR: Dopamine systems may have two functions, the phasic transmission of reward information and the tonic enabling of postsynaptic neurons.

...read moreread less

Abstract: Schultz, Wolfram. Predictive reward signal of dopamine neurons. J. Neurophysiol. 80: 1–27, 1998. The effects of lesions, receptor blocking, electrical self-stimulation, and drugs of abuse suggest t...

...read moreread less

3,962 citations

Journal Article•DOI•

Uncertainty-based competition between prefrontal and dorsolateral striatal systems for behavioral control

[...]

Nathaniel D. Daw¹, Yael Niv¹, Yael Niv², Peter Dayan¹•Institutions (2)

University College London¹, Interdisciplinary Center for Neural Computation²

06 Nov 2005-Nature Neuroscience

TL;DR: This work considers dual-action choice systems from a normative perspective, and suggests a Bayesian principle of arbitration between them according to uncertainty, so each controller is deployed when it should be most accurate.

...read moreread less

Abstract: A broad range of neural and behavioral data suggests that the brain contains multiple systems for behavioral choice, including one associated with prefrontal cortex and another with dorsolateral striatum. However, such a surfeit of control raises an additional choice problem: how to arbitrate between the systems when they disagree. Here, we consider dual-action choice systems from a normative perspective, using the computational theory of reinforcement learning. We identify a key trade-off pitting computational simplicity against the flexible and statistically efficient use of experience. The trade-off is realized in a competition between the dorsolateral striatal and prefrontal systems. We suggest a Bayesian principle of arbitration between them according to uncertainty, so each controller is deployed when it should be most accurate. This provides a unifying account of a wealth of experimental evidence about the factors favoring dominance by either system.

...read moreread less

2,171 citations

Journal Article•DOI•

Discrete coding of reward probability and uncertainty by dopamine neurons.

[...]

Christopher D. Fiorillo¹, Philippe N. Tobler², Wolfram Schultz²•Institutions (2)

University of Fribourg¹, University of Cambridge²

21 Mar 2003-Science

TL;DR: Using distinct stimuli to indicate the probability of reward, it was found that the phasic activation of dopamine neurons varied monotonically across the full range of probabilities, supporting past claims that this response codes the discrepancy between predicted and actual reward.

...read moreread less

Abstract: Uncertainty is critical in the measure of information and in assessing the accuracy of predictions. It is determined by probability P, being maximal at P = 0.5 and decreasing at higher and lower probabilities. Using distinct stimuli to indicate the probability of reward, we found that the phasic activation of dopamine neurons varied monotonically across the full range of probabilities, supporting past claims that this response codes the discrepancy between predicted and actual reward. In contrast, a previously unobserved response covaried with uncertainty and consisted of a gradual increase in activity until the potential time of reward. The coding of uncertainty suggests a possible role for dopamine signals in attention-based learning and risk-taking behavior.

...read moreread less

1,950 citations

"Learning to represent reward struct..." refers background in this paper

...Indeed, DA activity is also shown to encode “uncertainty” signals (Fiorillo et al., 2003) or “information-seeking” signals (Bromberg-Martin and Hikosaka, 2009)....
[...]