Rethinking dopamine prediction errors
Citations
124 citations
54 citations
42 citations
31 citations
29 citations
Cites background from "Rethinking dopamine prediction erro..."
...…it is the brief, appropriately timed, dynamic change in firing of these neurons that signals the error (Chang et al., 2016, 2017; Hamid et al., 2016) or whether the duration of the pause carries the information (Daw et al., 2002; Bayer and Glimcher, 2005; Bayer et al., 2007; Glimcher, 2011)....
[...]
References
37,989 citations
"Rethinking dopamine prediction erro..." refers background in this paper
...The TD error is the basis of the classic TD learning algorithm (Sutton and Barto, 1998), which in its simplest form updates the value estimate according to ∆V̂ (st) ∝ δt....
[...]
...Because the environment is assumed to obey the Markov property (transitions and rewards depend only on the current state), the value function can be written in a recursive form known as the Bellman equation (Sutton and Barto, 1998):...
[...]
...The TD error is the basis of the classic TD learning algorithm (Sutton and Barto, 1998), which in its simplest form updates the value estimate according to ....
[...]
...…the Markov property (transitions and rewards depend only on the current state), the value function can be written in a recursive form known as the Bellman equation (Sutton and Barto, 1998): The Bellman equation allows us to define efficient RL algorithms for estimating values, as we explain next....
[...]
8,163 citations
"Rethinking dopamine prediction erro..." refers background or methods in this paper
...The TD interpretation is important for explaining phenomena like the shift in signaling to earlier reward-predicting cues (Schultz et al., 1997), the temporal specificity of dopamine responses (Hollerman and Schultz, 1998; Takahashi et al....
[...]
...The TD interpretation is important for explaining phenomena like the shift in signaling to earlier reward-predicting cues (Schultz et al., 1997), the temporal specificity of dopamine responses (Hollerman and Schultz, 1998; Takahashi et al., 2016), and the sensitivity to long-term values (Enomoto et…...
[...]
...The success of the RPE hypothesis is exciting because the RPE is precisely the signal a reinforcement learning (RL) system would need to update reward expectations (Montague et al., 1996; Schultz et al., 1997)....
[...]
...As a result, even if dopamine is constrained by the model proposed here, it would support significantly more flexible behavior than supposed by classical model-free accounts (Montague et al., 1996; Schultz et al., 1997), even without moving completely to an account of model-based computation in the dopamine system (Langdon et al....
[...]
...The RPE hypothesis states that dopamine reports the TD error (Montague et al., 1996; Schultz et al., 1997)....
[...]
6,206 citations
"Rethinking dopamine prediction erro..." refers background in this paper
...acquires an independent association and that these associations summate when the stimuli are presented in compound (Rescorla and Wagner, 1972)....
[...]
...The classic approach to modeling this phenomenon is to assume that each stimulus acquires an independent association and that these associations summate when the stimuli are presented in compound (Rescorla and Wagner, 1972)....
[...]
2,171 citations
"Rethinking dopamine prediction erro..." refers background or methods in this paper
...For this reason, it has been proposed that the brain also makes use of model-based algorithms (Daw and Dayan, 2014; Daw et al., 2005), which occupy the opposite end of the efficiency-flexibility spectrum....
[...]
...Nevertheless, the theory proposed here—particularly if it incorporates off-line rehearsal in order to fully explain the results of Sharpe et al. (2017)—strains the dichotomy between model-based and model-free algorithms that has been at the heart of modern RL theories (Daw et al., 2005)....
[...]
...Because of this, devaluation-sensitivity has frequently been viewed as an assay of model-based RL (Daw et al., 2005)....
[...]
...(2017)—strains the dichotomy between model-based and model-free algorithms that has been at the heart of modern RL theories (Daw et al., 2005)....
[...]
1,920 citations
"Rethinking dopamine prediction erro..." refers background or methods in this paper
...The RPE hypothesis states that dopamine reports the TD error (Montague et al., 1996; Schultz et al., 1997)....
[...]
...The success of the RPE hypothesis is exciting because the RPE is precisely the signal a reinforcement learning (RL) system would need to update reward expectations (Montague et al., 1996; Schultz et al., 1997)....
[...]
...…is constrained by the model proposed here, it would support significantly more flexible behavior than supposed by classical model-free accounts (Montague et al., 1996; Schultz et al., 1997), even without moving completely to an account of model-based computation in the dopamine system (Langdon…...
[...]
...As a result, even if dopamine is constrained by the model proposed here, it would support significantly more flexible behavior than supposed by classical model-free accounts (Montague et al., 1996; Schultz et al., 1997), even without moving completely to an account of model-based computation in the dopamine system (Langdon et al....
[...]