scispace - formally typeset
Search or ask a question
Journal ArticleDOI

How we learn to make decisions: Rapid propagation of reinforcement learning prediction errors in humans

TL;DR: The brain ERP technique is used to demonstrate that not only do rewards elicit a neural response akin to a prediction error but also that this signal rapidly diminished and propagated to the time of choice presentation with learning, further support that the computations that underlie human learning and decision-making follow reinforcement learning principles.
Abstract: Our ability to make decisions is predicated upon our knowledge of the outcomes of the actions available to us. Reinforcement learning theory posits that actions followed by a reward or punishment acquire value through the computation of prediction errors-discrepancies between the predicted and the actual reward. A multitude of neuroimaging studies have demonstrated that rewards and punishments evoke neural responses that appear to reflect reinforcement learning prediction errors [e.g., Krigolson, O. E., Pierce, L. J., Holroyd, C. B., & Tanaka, J. W. Learning to become an expert: Reinforcement learning and the acquisition of perceptual expertise. Journal of Cognitive Neuroscience, 21, 1833-1840, 2009; Bayer, H. M., & Glimcher, P. W. Midbrain dopamine neurons encode a quantitative reward prediction error signal. Neuron, 47, 129-141, 2005; O'Doherty, J. P. Reward representations and reward-related learning in the human brain: Insights from neuroimaging. Current Opinion in Neurobiology, 14, 769-776, 2004; Holroyd, C. B., & Coles, M. G. H. The neural basis of human error processing: Reinforcement learning, dopamine, and the error-related negativity. Psychological Review, 109, 679-709, 2002]. Here, we used the brain ERP technique to demonstrate that not only do rewards elicit a neural response akin to a prediction error but also that this signal rapidly diminished and propagated to the time of choice presentation with learning. Specifically, in a simple, learnable gambling task, we show that novel rewards elicited a feedback error-related negativity that rapidly decreased in amplitude with learning. Furthermore, we demonstrate the existence of a reward positivity at choice presentation, a previously unreported ERP component that has a similar timing and topography as the feedback error-related negativity that increased in amplitude with learning. The pattern of results we observed mirrored the output of a computational model that we implemented to compute reward prediction errors and the changes in amplitude of these prediction errors at the time of choice presentation and reward delivery. Our results provide further support that the computations that underlie human learning and decision-making follow reinforcement learning principles.
Citations
More filters
Journal ArticleDOI
TL;DR: A meta-analysis of functional magnetic resonance imaging studies that had employed algorithmic reinforcement learning models across a variety of experimental paradigms found that the ventral striatum and midbrain/thalamus represented reward prediction errors, consistent with animal studies.
Abstract: Reinforcement learning describes motivated behavior in terms of two abstract signals. The representation of discrepancies between expected and actual rewards/punishments—prediction error—is thought to update the expected value of actions and predictive stimuli. Electrophysiological and lesion studies have suggested that mesostriatal prediction error signals control behavior through synaptic modification of cortico-striato-thalamic networks. Signals in the ventromedial prefrontal and orbitofrontal cortex are implicated in representing expected value. To obtain unbiased maps of these representations in the human brain, we performed a meta-analysis of functional magnetic resonance imaging studies that had employed algorithmic reinforcement learning models across a variety of experimental paradigms. We found that the ventral striatum (medial and lateral) and midbrain/thalamus represented reward prediction errors, consistent with animal studies. Prediction error signals were also seen in the frontal operculum/insula, particularly for social rewards. In Pavlovian studies, striatal prediction error signals extended into the amygdala, whereas instrumental tasks engaged the caudate. Prediction error maps were sensitive to the model-fitting procedure (fixed or individually estimated) and to the extent of spatial smoothing. A correlate of expected value was found in a posterior region of the ventromedial prefrontal cortex, caudal and medial to the orbitofrontal regions identified in animal studies. These findings highlight a reproducible motif of reinforcement learning in the cortico-striatal loops and identify methodological dimensions that may influence the reproducibility of activation patterns across studies.

183 citations

Journal ArticleDOI
TL;DR: Ten key methodological issues are discussed - confusion in component naming, the reward positivity, component identification, peak quantification and the use of difference waveforms, frequency, and component contamination, and how learning results in changes in the amplitude of the feedback-related negativity/reward positivity.

96 citations

Journal ArticleDOI
TL;DR: Trial-level analyses indicated that a diagnosis of MDD and depressive symptom severity significantly moderated the trajectory of reward positivity, with individuals with higher symptoms of depression demonstrating less sensitivity to rewards over time.

57 citations

Journal ArticleDOI
TL;DR: Evidence is provided of two neural signals sensitive to visual feedback during PA that may sub-serve changes in visuomotor responding that are important to optimize PA's use for VSN patients.
Abstract: Prism adaptation (PA) is both a perceptual-motor learning task as well as a promising rehabilitation tool for visuo-spatial neglect (VSN) – a spatial attention disorder often experienced after stroke resulting in slowed and/or inaccurate motor responses to contralesional targets During PA, individuals are exposed to prism-induced shifts of the visual-field while performing a visuo-guided reaching task After adaptation, with goggles removed, visuo-motor responding is shifted to the opposite direction of that initially induced by the prisms This visuo-motor aftereffect has been used to study visuo-motor learning and adaptation and has been applied clinically to reduce VSN severity by improving motor responding to stimuli in contralesional (usually left-sided) space In order to optimize PA’s use for VSN patients, it is important to elucidate the neural and cognitive processes that alter visuomotor function during PA In the present study, healthy young adults underwent PA while event-related potentials (ERPs) were recorded at the termination of each reach (screen-touch), then binned according to accuracy (hit vs miss) and phase of exposure block (early, middle, late) Results show that two ERP components were evoked by screen-touch: an early error-related negativity (ERN), and a P300 The ERN was consistently evoked on miss trials during adaptation, while the P300 amplitude was largest during the early phase of adaptation for both hit and miss trials This study provides evidence of two neural signals sensitive to visual feedback during PA that may sub-serve changes in visuomotor responding Prior ERP research suggests that the ERN reflects an error processing system in medial-frontal cortex, while the P300 is suggested to reflect a system for context updating and learning Future research is needed to elucidate the role of these ERP components in improving visuomotor responses among individuals with VSN

40 citations

Journal ArticleDOI
TL;DR: This review discusses the application of electroencephalography (EEG) to the understanding of the temporal nature of visual cue utilization during movement planning, control, and learning using four existing scalp potentials and examines the appropriateness of using the N100 potential as an indicator of corrective behaviors in response to target perturbation.

31 citations


Cites background or methods or result from "How we learn to make decisions: Rap..."

  • ...Testing this proposition, Krigolson and Holroyd (2007d) and Krigolson et al....

    [...]

  • ...seen in the work of Krigolson et al. (2013). In their study, Krigolson and colleagues recorded electroencephalographic data while par-...

    [...]

  • ...While the timing and topography are not consistent with previous accounts of the FRN (Miltner et al., 1997), some researchers have found that the timing of the FRN can be considerably later when visual feedback evaluation is complex in nature (see Krigolson et al., 2013 for more detail)....

    [...]

  • ...As noted above, the N100 observed by Krigolson and Holroyd (2007a) peaked shortly after the onset of the target perturbation and had a maximal component amplitude over left parietaloccipital visual areas of cortex (see Fig....

    [...]

  • ...Indeed, Heath et al.’s (2012) results, along with those of Krigolson et al. (2013) seem to suggest that the processes that underlie the N200 and P300 may be related to the encoding of target location and/or movement amplitude....

    [...]

References
More filters
Book
01 Jan 1988
TL;DR: This book provides a clear and simple account of the key ideas and algorithms of reinforcement learning, which ranges from the history of the field's intellectual foundations to the most recent developments and applications.
Abstract: Reinforcement learning, one of the most active research areas in artificial intelligence, is a computational approach to learning whereby an agent tries to maximize the total amount of reward it receives when interacting with a complex, uncertain environment. In Reinforcement Learning, Richard Sutton and Andrew Barto provide a clear and simple account of the key ideas and algorithms of reinforcement learning. Their discussion ranges from the history of the field's intellectual foundations to the most recent developments and applications. The only necessary mathematical background is familiarity with elementary concepts of probability. The book is divided into three parts. Part I defines the reinforcement learning problem in terms of Markov decision processes. Part II provides basic solution methods: dynamic programming, Monte Carlo methods, and temporal-difference learning. Part III presents a unified view of the solution methods and incorporates artificial neural networks, eligibility traces, and planning; the two final chapters present case studies and consider the future of reinforcement learning.

37,989 citations


"How we learn to make decisions: Rap..." refers background or methods in this paper

  • ...…0 and 0.01 (Holroyd & Coles, 2008), these weights determined the probability that an action was selected based on a softmax decision function (Sutton & Barto, 1998): Pðaction i is selectedÞ ¼ e w1 τ e w2 τ þ ew2τ Here, τ represents temperature, a model parameter that determines the degree to…...

    [...]

  • ...…from a state with no value—the state before Q—into a state with value—state Q. Computational RL theories such as the method of temporal differences (Sutton & Barto, 1998) take this into account and posit that prediction errors are computed as the difference between the value and rewards of the…...

    [...]

  • ...For example, a high temperature makes all options equally likely, whereas a low temperature biases the resulting probabilities toward the higher-weighted option (Sutton & Barto, 1998)....

    [...]

  • ...Finally, to verify that the pattern of our ERP results mirrored the predictions of RL theory, we implemented a computational model that utilized a RL algorithm (cf., Sutton & Barto, 1998) to learn and perform the gambling task....

    [...]

  • ...Reinforcement learning (RL) theory proposes that the value of an action is a prediction of the subsequent reward or punishment gained by selecting that action (Sutton & Barto, 1998; Rescorla & Wagner, 1972)....

    [...]

Journal ArticleDOI
TL;DR: EELAB as mentioned in this paper is a toolbox and graphic user interface for processing collections of single-trial and/or averaged EEG data of any number of channels, including EEG data, channel and event information importing, data visualization (scrolling, scalp map and dipole model plotting, plus multi-trial ERP-image plots), preprocessing (including artifact rejection, filtering, epoch selection, and averaging), Independent Component Analysis (ICA) and time/frequency decomposition including channel and component cross-coherence supported by bootstrap statistical methods based on data resampling.

17,362 citations


"How we learn to make decisions: Rap..." refers methods in this paper

  • ...All analyses were done with EEGLAB (Delorme & Makeig, 2004) and custom code written in the Matlab (MathWorks, Natick, MA) programming environment....

    [...]

Journal ArticleDOI
14 Mar 1997-Science
TL;DR: Findings in this work indicate that dopaminergic neurons in the primate whose fluctuating output apparently signals changes or errors in the predictions of future salient and rewarding events can be understood through quantitative theories of adaptive optimizing control.
Abstract: The capacity to predict future events permits a creature to detect, model, and manipulate the causal structure of its interactions with its environment. Behavioral experiments suggest that learning is driven by changes in the expectations about future salient events such as rewards and punishments. Physiological work has recently complemented these studies by identifying dopaminergic neurons in the primate whose fluctuating output apparently signals changes or errors in the predictions of future salient and rewarding events. Taken together, these findings can be understood through quantitative theories of adaptive optimizing control.

8,163 citations


"How we learn to make decisions: Rap..." refers background in this paper

  • ...Seminal work by Schultz, Dayan, and Montague (1997) demonstrated that, when monkeys are initially given a reward, there is an associated phasic increase in the firing rate of dopaminergic neurons in the substansia nigra pars compacta....

    [...]

Book
01 Mar 1998
TL;DR: In Reinforcement Learning, Richard Sutton and Andrew Barto provide a clear and simple account of the key ideas and algorithms of reinforcement learning.
Abstract: From the Publisher: In Reinforcement Learning, Richard Sutton and Andrew Barto provide a clear and simple account of the key ideas and algorithms of reinforcement learning. Their discussion ranges from the history of the field's intellectual foundations to the most recent developments and applications. The only necessary mathematical background is familiarity with elementary concepts of probability.

7,016 citations


"How we learn to make decisions: Rap..." refers background or methods in this paper

  • ...…0 and 0.01 (Holroyd & Coles, 2008), these weights determined the probability that an action was selected based on a softmax decision function (Sutton & Barto, 1998): Pðaction i is selectedÞ ¼ e w1 τ e w2 τ þ ew2τ Here, τ represents temperature, a model parameter that determines the degree to…...

    [...]

  • ...…from a state with no value—the state before Q—into a state with value—state Q. Computational RL theories such as the method of temporal differences (Sutton & Barto, 1998) take this into account and posit that prediction errors are computed as the difference between the value and rewards of the…...

    [...]

  • ...For example, a high temperature makes all options equally likely, whereas a low temperature biases the resulting probabilities toward the higher-weighted option (Sutton & Barto, 1998)....

    [...]

  • ...Finally, to verify that the pattern of our ERP results mirrored the predictions of RL theory, we implemented a computational model that utilized a RL algorithm (cf., Sutton & Barto, 1998) to learn and perform the gambling task....

    [...]

  • ...Reinforcement learning (RL) theory proposes that the value of an action is a prediction of the subsequent reward or punishment gained by selecting that action (Sutton & Barto, 1998; Rescorla & Wagner, 1972)....

    [...]

01 Jan 1972

6,206 citations


"How we learn to make decisions: Rap..." refers background or result in this paper

  • ...In summary, the pattern of changes in the dopaminergic response to the predictive cue and the reward mirrored the predictions of Rescorla and Wagner—prediction errors at the time of reward diminished and prediction errors at stimulus presentation increased with learning. Studies in human observing the neural response to feedback have demonstrated a pattern of results similar to the theoretical predictions of Rescorla and Wagner (1972) and the results observed in monkey by Schultz and colleagues (1997). Specifically, in a series of experiments, Holroyd and colleagues (Holroyd, Pakzad-Vaezi, & Krigolson, 2008; Holroyd & Krigolson, 2007; Holroyd & Coles, 2002) have demonstrated that the amplitude of the feedback error-related negativity (fERN), a component of the human brain ERP, is sensitive to reward expectancy and further, that it only occurs in situations when participants must rely on feedback to determine response outcome. In other words, a fERN is only observed when one moves from a state with no value into a state with either positive or negative value. Extending this, Krigolson, Pierce, Holroyd, and Tanaka (2009) found that the magni-...

    [...]

  • ...In summary, the pattern of changes in the dopaminergic response to the predictive cue and the reward mirrored the predictions of Rescorla and Wagner—prediction errors at the time of reward diminished and prediction errors at stimulus presentation increased with learning. Studies in human observing the neural response to feedback have demonstrated a pattern of results similar to the theoretical predictions of Rescorla and Wagner (1972) and the results observed in monkey by Schultz and colleagues (1997)....

    [...]

  • ...In summary, the pattern of changes in the dopaminergic response to the predictive cue and the reward mirrored the predictions of Rescorla and Wagner—prediction errors at the time of reward diminished and prediction errors at stimulus presentation increased with learning. Studies in human observing the neural response to feedback have demonstrated a pattern of results similar to the theoretical predictions of Rescorla and Wagner (1972) and the results observed in monkey by Schultz and colleagues (1997). Specifically, in a series of experiments, Holroyd and colleagues (Holroyd, Pakzad-Vaezi, & Krigolson, 2008; Holroyd & Krigolson, 2007; Holroyd & Coles, 2002) have demonstrated that the amplitude of the feedback error-related negativity (fERN), a component of the human brain ERP, is sensitive to reward expectancy and further, that it only occurs in situations when participants must rely on feedback to determine response outcome....

    [...]

  • ..., the prediction of reward: Rescorla and Wagner [1972] and Sutton and Barto [1998])....

    [...]

  • ...Reinforcement learning (RL) theory proposes that the value of an action is a prediction of the subsequent reward or punishment gained by selecting that action (Sutton & Barto, 1998; Rescorla & Wagner, 1972)....

    [...]