scispace - formally typeset
Search or ask a question
Posted ContentDOI

Rethinking dopamine prediction errors

25 Dec 2017-bioRxiv (Cold Spring Harbor Laboratory)-pp 239731
TL;DR: A new theory of dopamine function is developed that embraces a broader conceptualization of prediction errors and can account for the role of dopamine in phenomena such as sensory preconditioning and identity unblocking, which ostensibly draw upon knowledge beyond reward predictions.
Abstract: Midbrain dopamine neurons are commonly thought to report a reward prediction error, as hypothesized by reinforcement learning theory. While this theory has been highly successful, several lines of evidence suggest that dopamine activity also encodes sensory prediction errors unrelated to reward. Here we develop a new theory of dopamine function that embraces a broader conceptualization of prediction errors. By signaling errors in both sensory and reward predictions, dopamine supports a form of reinforcement learning that lies between model-based and model-free algorithms. We show that the theory can account for the role of dopamine in phenomena such as sensory preconditioning and identity unblocking, which ostensibly draw upon knowledge beyond reward predictions.
Citations
More filters
Journal ArticleDOI
TL;DR: Progress on the successor representation, which encodes states of the environment in terms of their predictive relationships with other states, and a broader framework for understanding how the brain negotiates tradeoffs between efficiency and flexibility for reinforcement learning are reviewed.
Abstract: Reinforcement learning is the process by which an agent learns to predict long-term future reward. We now understand a great deal about the brain's reinforcement learning algorithms, but we know considerably less about the representations of states and actions over which these algorithms operate. A useful starting point is asking what kinds of representations we would want the brain to have, given the constraints on its computational architecture. Following this logic leads to the idea of the successor representation, which encodes states of the environment in terms of their predictive relationships with other states. Recent behavioral and neural studies have provided evidence for the successor representation, and computational studies have explored ways to extend the original idea. This paper reviews progress on these fronts, organizing them within a broader framework for understanding how the brain negotiates tradeoffs between efficiency and flexibility for reinforcement learning.

124 citations

Posted ContentDOI
22 Oct 2018-bioRxiv
TL;DR: An ensemble of SRs with multiple scales is proposed and it is shown that the derivative of multi-scale SR can reconstruct both the sequence of expected future states and estimate distance to goal, and can be computed linearly.
Abstract: The successor representation (SR) is a candidate principle for generalization in reinforcement learning, computational accounts of memory, and the structure of neural representations in the hippocampus. Given a sequence of states, the SR learns a predictive representation for every given state that encodes how often, on average, each upcoming state is expected to be visited, even if it is multiple steps ahead. A discount or scale parameter determines how many steps into the future SR’s generalizations reach, enabling rapid value computation, subgoal discovery, and flexible decision-making in large trees. However, SR with a single scale could discard information for predicting both the sequential order of and the distance between states, which are common problems in navigation for animals and artificial agents. Here we propose a solution: an ensemble of SRs with multiple scales. We show that the derivative of multi-scale SR can reconstruct both the sequence of expected future states and estimate distance to goal. This derivative can be computed linearly: we show that a multi-scale SR ensemble is the Laplace transform of future states, and the inverse of this Laplace transform is a biologically plausible linear estimation of the derivative. Multi-scale SR and its derivative could lead to a common principle for how the medial temporal lobe supports both map-based and vector-based navigation.

42 citations

Journal ArticleDOI
TL;DR: It is argued that through modelling some of the brain's fundamental cognitive computations, and relating them to brain development, this can bridge the gap between brain and cognitive development, and lead to a richer understanding of the ontogeny of psychiatric disorders.
Abstract: Most psychiatric disorders emerge during childhood and adolescence. This is also a period that coincides with the brain undergoing substantial growth and reorganisation. However, it remains unclear how a heightened vulnerability to psychiatric disorder relates to this brain maturation. Here, we propose 'developmental computational psychiatry' as a framework for linking brain maturation to cognitive development. We argue that through modelling some of the brain's fundamental cognitive computations, and relating them to brain development, we can bridge the gap between brain and cognitive development. This in turn can lead to a richer understanding of the ontogeny of psychiatric disorders. We illustrate this perspective with examples from reinforcement learning and dopamine function. Specifically, we show how computational modelling deepens an understanding of how cognitive processes, such as reward learning, effort learning, and social learning might go awry in psychiatric disorders. Finally, we sketch the promises and limitations of a developmental computational psychiatry.

31 citations

Journal ArticleDOI
TL;DR: It is formally shown that brief pauses in the firing of midbrain dopamine neurons are sufficient to produce a cue that meets the classic criteria defining a conditioned inhibitor, or a Cue that predicts the omission of a reward.
Abstract: Prediction errors are critical for associative learning. In the brain, these errors are thought to be signaled, in part, by midbrain dopamine neurons. However, although there is substantial direct evidence that brief increases in the firing of these neurons can mimic positive prediction errors, there is less evidence that brief pauses mimic negative errors. Whereas pauses in the firing of midbrain dopamine neurons can substitute for missing negative prediction errors to drive extinction, it has been suggested that this effect might be attributable to changes in salience rather than the operation of this signal as a negative prediction error. Here we address this concern by showing that the same pattern of inhibition will create a cue able to meet the classic definition of a conditioned inhibitor by showing suppression of responding in a summation test and slower learning in a retardation test. Importantly, these classic criteria were designed to rule out explanations founded on attention or salience; thus the results cannot be explained in this manner. We also show that this pattern of behavior is not produced by a single, prolonged, ramped period of inhibition, suggesting that it is precisely timed, sudden change and not duration that conveys the teaching signal.SIGNIFICANCE STATEMENT Here we show that brief pauses in the firing of midbrain dopamine neurons are sufficient to produce a cue that meets the classic criteria defining a conditioned inhibitor, or a cue that predicts the omission of a reward. These criteria were developed to distinguish actual learning from salience or attentional effects; thus these results formally show that brief pauses in the firing of dopamine neurons can serve as key teaching signals in the brain. Interestingly, this was not true for gradual prolonged pauses, suggesting it is the dynamic change in firing that serves as the teaching signal.

29 citations


Cites background from "Rethinking dopamine prediction erro..."

  • ...…it is the brief, appropriately timed, dynamic change in firing of these neurons that signals the error (Chang et al., 2016, 2017; Hamid et al., 2016) or whether the duration of the pause carries the information (Daw et al., 2002; Bayer and Glimcher, 2005; Bayer et al., 2007; Glimcher, 2011)....

    [...]

References
More filters
Book
01 Jan 1988
TL;DR: This book provides a clear and simple account of the key ideas and algorithms of reinforcement learning, which ranges from the history of the field's intellectual foundations to the most recent developments and applications.
Abstract: Reinforcement learning, one of the most active research areas in artificial intelligence, is a computational approach to learning whereby an agent tries to maximize the total amount of reward it receives when interacting with a complex, uncertain environment. In Reinforcement Learning, Richard Sutton and Andrew Barto provide a clear and simple account of the key ideas and algorithms of reinforcement learning. Their discussion ranges from the history of the field's intellectual foundations to the most recent developments and applications. The only necessary mathematical background is familiarity with elementary concepts of probability. The book is divided into three parts. Part I defines the reinforcement learning problem in terms of Markov decision processes. Part II provides basic solution methods: dynamic programming, Monte Carlo methods, and temporal-difference learning. Part III presents a unified view of the solution methods and incorporates artificial neural networks, eligibility traces, and planning; the two final chapters present case studies and consider the future of reinforcement learning.

37,989 citations


"Rethinking dopamine prediction erro..." refers background in this paper

  • ...The TD error is the basis of the classic TD learning algorithm (Sutton and Barto, 1998), which in its simplest form updates the value estimate according to ∆V̂ (st) ∝ δt....

    [...]

  • ...Because the environment is assumed to obey the Markov property (transitions and rewards depend only on the current state), the value function can be written in a recursive form known as the Bellman equation (Sutton and Barto, 1998):...

    [...]

  • ...The TD error is the basis of the classic TD learning algorithm (Sutton and Barto, 1998), which in its simplest form updates the value estimate according to ....

    [...]

  • ...…the Markov property (transitions and rewards depend only on the current state), the value function can be written in a recursive form known as the Bellman equation (Sutton and Barto, 1998): The Bellman equation allows us to define efficient RL algorithms for estimating values, as we explain next....

    [...]

Journal ArticleDOI
14 Mar 1997-Science
TL;DR: Findings in this work indicate that dopaminergic neurons in the primate whose fluctuating output apparently signals changes or errors in the predictions of future salient and rewarding events can be understood through quantitative theories of adaptive optimizing control.
Abstract: The capacity to predict future events permits a creature to detect, model, and manipulate the causal structure of its interactions with its environment. Behavioral experiments suggest that learning is driven by changes in the expectations about future salient events such as rewards and punishments. Physiological work has recently complemented these studies by identifying dopaminergic neurons in the primate whose fluctuating output apparently signals changes or errors in the predictions of future salient and rewarding events. Taken together, these findings can be understood through quantitative theories of adaptive optimizing control.

8,163 citations


"Rethinking dopamine prediction erro..." refers background or methods in this paper

  • ...The TD interpretation is important for explaining phenomena like the shift in signaling to earlier reward-predicting cues (Schultz et al., 1997), the temporal specificity of dopamine responses (Hollerman and Schultz, 1998; Takahashi et al....

    [...]

  • ...The TD interpretation is important for explaining phenomena like the shift in signaling to earlier reward-predicting cues (Schultz et al., 1997), the temporal specificity of dopamine responses (Hollerman and Schultz, 1998; Takahashi et al., 2016), and the sensitivity to long-term values (Enomoto et…...

    [...]

  • ...The success of the RPE hypothesis is exciting because the RPE is precisely the signal a reinforcement learning (RL) system would need to update reward expectations (Montague et al., 1996; Schultz et al., 1997)....

    [...]

  • ...As a result, even if dopamine is constrained by the model proposed here, it would support significantly more flexible behavior than supposed by classical model-free accounts (Montague et al., 1996; Schultz et al., 1997), even without moving completely to an account of model-based computation in the dopamine system (Langdon et al....

    [...]

  • ...The RPE hypothesis states that dopamine reports the TD error (Montague et al., 1996; Schultz et al., 1997)....

    [...]

01 Jan 1972

6,206 citations


"Rethinking dopamine prediction erro..." refers background in this paper

  • ...acquires an independent association and that these associations summate when the stimuli are presented in compound (Rescorla and Wagner, 1972)....

    [...]

  • ...The classic approach to modeling this phenomenon is to assume that each stimulus acquires an independent association and that these associations summate when the stimuli are presented in compound (Rescorla and Wagner, 1972)....

    [...]

Journal ArticleDOI
TL;DR: This work considers dual-action choice systems from a normative perspective, and suggests a Bayesian principle of arbitration between them according to uncertainty, so each controller is deployed when it should be most accurate.
Abstract: A broad range of neural and behavioral data suggests that the brain contains multiple systems for behavioral choice, including one associated with prefrontal cortex and another with dorsolateral striatum. However, such a surfeit of control raises an additional choice problem: how to arbitrate between the systems when they disagree. Here, we consider dual-action choice systems from a normative perspective, using the computational theory of reinforcement learning. We identify a key trade-off pitting computational simplicity against the flexible and statistically efficient use of experience. The trade-off is realized in a competition between the dorsolateral striatal and prefrontal systems. We suggest a Bayesian principle of arbitration between them according to uncertainty, so each controller is deployed when it should be most accurate. This provides a unifying account of a wealth of experimental evidence about the factors favoring dominance by either system.

2,171 citations


"Rethinking dopamine prediction erro..." refers background or methods in this paper

  • ...For this reason, it has been proposed that the brain also makes use of model-based algorithms (Daw and Dayan, 2014; Daw et al., 2005), which occupy the opposite end of the efficiency-flexibility spectrum....

    [...]

  • ...Nevertheless, the theory proposed here—particularly if it incorporates off-line rehearsal in order to fully explain the results of Sharpe et al. (2017)—strains the dichotomy between model-based and model-free algorithms that has been at the heart of modern RL theories (Daw et al., 2005)....

    [...]

  • ...Because of this, devaluation-sensitivity has frequently been viewed as an assay of model-based RL (Daw et al., 2005)....

    [...]

  • ...(2017)—strains the dichotomy between model-based and model-free algorithms that has been at the heart of modern RL theories (Daw et al., 2005)....

    [...]

Journal ArticleDOI
TL;DR: A theoretical framework is developed that shows how mesencephalic dopamine systems could distribute to their targets a signal that represents information about future expectations and shows that, through a simple influence on synaptic plasticity, fluctuations in dopamine release can act to change the predictions in an appropriate manner.
Abstract: We develop a theoretical framework that shows how mesencephalic dopamine systems could distribute to their targets a signal that represents information about future expectations. In particular, we show how activity in the cerebral cortex can make predictions about future receipt of reward and how fluctuations in the activity levels of neurons in diffuse dopamine systems above and below baseline levels would represent errors in these predictions that are delivered to cortical and subcortical targets. We present a model for how such errors could be constructed in a real brain that is consistent with physiological results for a subset of dopaminergic neurons located in the ventral tegmental area and surrounding dopaminergic neurons. The theory also makes testable predictions about human choice behavior on a simple decision-making task. Furthermore, we show that, through a simple influence on synaptic plasticity, fluctuations in dopamine release can act to change the predictions in an appropriate manner.

1,920 citations


"Rethinking dopamine prediction erro..." refers background or methods in this paper

  • ...The RPE hypothesis states that dopamine reports the TD error (Montague et al., 1996; Schultz et al., 1997)....

    [...]

  • ...The success of the RPE hypothesis is exciting because the RPE is precisely the signal a reinforcement learning (RL) system would need to update reward expectations (Montague et al., 1996; Schultz et al., 1997)....

    [...]

  • ...…is constrained by the model proposed here, it would support significantly more flexible behavior than supposed by classical model-free accounts (Montague et al., 1996; Schultz et al., 1997), even without moving completely to an account of model-based computation in the dopamine system (Langdon…...

    [...]

  • ...As a result, even if dopamine is constrained by the model proposed here, it would support significantly more flexible behavior than supposed by classical model-free accounts (Montague et al., 1996; Schultz et al., 1997), even without moving completely to an account of model-based computation in the dopamine system (Langdon et al....

    [...]