How we learn to make decisions: Rapid propagation of reinforcement learning prediction errors in humans
Citations
183 citations
96 citations
57 citations
40 citations
31 citations
Cites background or methods or result from "How we learn to make decisions: Rap..."
...Testing this proposition, Krigolson and Holroyd (2007d) and Krigolson et al....
[...]
...seen in the work of Krigolson et al. (2013). In their study, Krigolson and colleagues recorded electroencephalographic data while par-...
[...]
...While the timing and topography are not consistent with previous accounts of the FRN (Miltner et al., 1997), some researchers have found that the timing of the FRN can be considerably later when visual feedback evaluation is complex in nature (see Krigolson et al., 2013 for more detail)....
[...]
...As noted above, the N100 observed by Krigolson and Holroyd (2007a) peaked shortly after the onset of the target perturbation and had a maximal component amplitude over left parietaloccipital visual areas of cortex (see Fig....
[...]
...Indeed, Heath et al.’s (2012) results, along with those of Krigolson et al. (2013) seem to suggest that the processes that underlie the N200 and P300 may be related to the encoding of target location and/or movement amplitude....
[...]
References
37,989 citations
"How we learn to make decisions: Rap..." refers background or methods in this paper
...…0 and 0.01 (Holroyd & Coles, 2008), these weights determined the probability that an action was selected based on a softmax decision function (Sutton & Barto, 1998): Pðaction i is selectedÞ ¼ e w1 τ e w2 τ þ ew2τ Here, τ represents temperature, a model parameter that determines the degree to…...
[...]
...…from a state with no value—the state before Q—into a state with value—state Q. Computational RL theories such as the method of temporal differences (Sutton & Barto, 1998) take this into account and posit that prediction errors are computed as the difference between the value and rewards of the…...
[...]
...For example, a high temperature makes all options equally likely, whereas a low temperature biases the resulting probabilities toward the higher-weighted option (Sutton & Barto, 1998)....
[...]
...Finally, to verify that the pattern of our ERP results mirrored the predictions of RL theory, we implemented a computational model that utilized a RL algorithm (cf., Sutton & Barto, 1998) to learn and perform the gambling task....
[...]
...Reinforcement learning (RL) theory proposes that the value of an action is a prediction of the subsequent reward or punishment gained by selecting that action (Sutton & Barto, 1998; Rescorla & Wagner, 1972)....
[...]
17,362 citations
"How we learn to make decisions: Rap..." refers methods in this paper
...All analyses were done with EEGLAB (Delorme & Makeig, 2004) and custom code written in the Matlab (MathWorks, Natick, MA) programming environment....
[...]
8,163 citations
"How we learn to make decisions: Rap..." refers background in this paper
...Seminal work by Schultz, Dayan, and Montague (1997) demonstrated that, when monkeys are initially given a reward, there is an associated phasic increase in the firing rate of dopaminergic neurons in the substansia nigra pars compacta....
[...]
7,016 citations
"How we learn to make decisions: Rap..." refers background or methods in this paper
...…0 and 0.01 (Holroyd & Coles, 2008), these weights determined the probability that an action was selected based on a softmax decision function (Sutton & Barto, 1998): Pðaction i is selectedÞ ¼ e w1 τ e w2 τ þ ew2τ Here, τ represents temperature, a model parameter that determines the degree to…...
[...]
...…from a state with no value—the state before Q—into a state with value—state Q. Computational RL theories such as the method of temporal differences (Sutton & Barto, 1998) take this into account and posit that prediction errors are computed as the difference between the value and rewards of the…...
[...]
...For example, a high temperature makes all options equally likely, whereas a low temperature biases the resulting probabilities toward the higher-weighted option (Sutton & Barto, 1998)....
[...]
...Finally, to verify that the pattern of our ERP results mirrored the predictions of RL theory, we implemented a computational model that utilized a RL algorithm (cf., Sutton & Barto, 1998) to learn and perform the gambling task....
[...]
...Reinforcement learning (RL) theory proposes that the value of an action is a prediction of the subsequent reward or punishment gained by selecting that action (Sutton & Barto, 1998; Rescorla & Wagner, 1972)....
[...]
6,206 citations
"How we learn to make decisions: Rap..." refers background or result in this paper
...In summary, the pattern of changes in the dopaminergic response to the predictive cue and the reward mirrored the predictions of Rescorla and Wagner—prediction errors at the time of reward diminished and prediction errors at stimulus presentation increased with learning. Studies in human observing the neural response to feedback have demonstrated a pattern of results similar to the theoretical predictions of Rescorla and Wagner (1972) and the results observed in monkey by Schultz and colleagues (1997). Specifically, in a series of experiments, Holroyd and colleagues (Holroyd, Pakzad-Vaezi, & Krigolson, 2008; Holroyd & Krigolson, 2007; Holroyd & Coles, 2002) have demonstrated that the amplitude of the feedback error-related negativity (fERN), a component of the human brain ERP, is sensitive to reward expectancy and further, that it only occurs in situations when participants must rely on feedback to determine response outcome. In other words, a fERN is only observed when one moves from a state with no value into a state with either positive or negative value. Extending this, Krigolson, Pierce, Holroyd, and Tanaka (2009) found that the magni-...
[...]
...In summary, the pattern of changes in the dopaminergic response to the predictive cue and the reward mirrored the predictions of Rescorla and Wagner—prediction errors at the time of reward diminished and prediction errors at stimulus presentation increased with learning. Studies in human observing the neural response to feedback have demonstrated a pattern of results similar to the theoretical predictions of Rescorla and Wagner (1972) and the results observed in monkey by Schultz and colleagues (1997)....
[...]
...In summary, the pattern of changes in the dopaminergic response to the predictive cue and the reward mirrored the predictions of Rescorla and Wagner—prediction errors at the time of reward diminished and prediction errors at stimulus presentation increased with learning. Studies in human observing the neural response to feedback have demonstrated a pattern of results similar to the theoretical predictions of Rescorla and Wagner (1972) and the results observed in monkey by Schultz and colleagues (1997). Specifically, in a series of experiments, Holroyd and colleagues (Holroyd, Pakzad-Vaezi, & Krigolson, 2008; Holroyd & Krigolson, 2007; Holroyd & Coles, 2002) have demonstrated that the amplitude of the feedback error-related negativity (fERN), a component of the human brain ERP, is sensitive to reward expectancy and further, that it only occurs in situations when participants must rely on feedback to determine response outcome....
[...]
..., the prediction of reward: Rescorla and Wagner [1972] and Sutton and Barto [1998])....
[...]
...Reinforcement learning (RL) theory proposes that the value of an action is a prediction of the subsequent reward or punishment gained by selecting that action (Sutton & Barto, 1998; Rescorla & Wagner, 1972)....
[...]