scispace - formally typeset
Open AccessJournal ArticleDOI

Predictive Reward Signal of Dopamine Neurons

Wolfram Schultz
- 01 Jul 1998 - 
- Vol. 80, Iss: 1, pp 1-27
TLDR
Dopamine systems may have two functions, the phasic transmission of reward information and the tonic enabling of postsynaptic neurons.
Abstract
Schultz, Wolfram. Predictive reward signal of dopamine neurons. J. Neurophysiol. 80: 1–27, 1998. The effects of lesions, receptor blocking, electrical self-stimulation, and drugs of abuse suggest t...

read more

Content maybe subject to copyright    Report

INVITED REVIEW
Predictive Reward Signal of Dopamine Neurons
WOLFRAM SCHULTZ
Institute of Physiology and Program in Neuroscience, University of Fribourg, CH-1700 Fribourg, Switzerland
Schultz, Wolfram. Predictive reward signal of dopamine neurons.
is called rewards, which elicit and reinforce approach behav-
J. Neurophysiol. 80: 127, 1998. The effects of lesions, receptor
ior. The functions of rewards were developed further during
blocking, electrical self-stimulation, and drugs of abuse suggest
the evolution of higher mammals to support more sophisti-
that midbrain dopamine systems are involved in processing reward
cated forms of individual and social behavior. Thus biologi-
information and learning approach behavior. Most dopamine neu-
cal and cognitive needs define the nature of rewards, and
rons show phasic activations after primary liquid and food rewards
the availability of rewards determines some of the basic
and conditioned, reward-predicting visual and auditory stimuli.
parameters of the subject’s life conditions.
They show biphasic, activation-depression responses after stimuli
Rewards come in various physical forms, are highly variable
that resemble reward-predicting stimuli or are novel or particularly
in time and depend on the particular environment of the subject.
salient. However, only few phasic activations follow aversive stim-
uli. Thus dopamine neurons label environmental stimuli with appe-
Despite their importance, rewards do not influence the brain
titive value, predict and detect rewards and signal alerting and
through dedicated peripheral receptors tuned to a limited range
motivating events. By failing to discriminate between different
of physical modalities as is the case for primary sensory sys-
rewards, dopamine neurons appear to emit an alerting message
tems. Rather, reward information is extracted by the brain from
about the surprising presence or absence of rewards. All responses
a large variety of polysensory, inhomogeneous, and inconstant
to rewards and reward-predicting stimuli depend on event predict-
stimuli by using particular neuronal mechanisms. The highly
ability. Dopamine neurons are activated by rewarding events that
variable nature of rewards requires high degrees of adaptation
are better than predicted, remain uninfluenced by events that are
in neuronal systems processing them.
as good as predicted, and are depressed by events that are worse
One of the principal neuronal systems involved in pro-
than predicted. By signaling rewards according to a prediction
cessing reward information appears to be the dopamine sys-
error, dopamine responses have the formal characteristics of a
teaching signal postulated by reinforcement learning theories. Do-
tem. Behavioral studies show that dopamine projections to
pamine responses transfer during learning from primary rewards
the striatum and frontal cortex play a central role in mediat-
to reward-predicting stimuli. This may contribute to neuronal
ing the effects of rewards on approach behavior and learning.
mechanisms underlying the retrograde action of rewards, one of
These results are derived from selective lesions of different
the main puzzles in reinforcement learning. The impulse response
components of dopamine systems, systemic and intracerebral
releases a short pulse of dopamine onto many dendrites, thus broad-
administration of direct and indirect dopamine receptor ago-
casting a rather global reinforcement signal to postsynaptic neu-
nist and antagonist drugs, electrical self-stimulation, and
rons. This signal may improve approach behavior by providing
self-administration of major drugs of abuse, such as cocaine,
advance reward information before the behavior occurs, and may
amphetamine, opiates, alcohol, and nicotine (Beninger and
contribute to learning by modifying synaptic transmission. The
Hahn 1983; Di Chiara 1995; Fibiger and Phillips 1986; Rob-
dopamine reward signal is supplemented by activity in neurons in
striatum, frontal cortex, and amygdala, which process specific re-
bins and Everitt 1992; Robinson and Berridge 1993; Wise
ward information but do not emit a global reward prediction error
1996; Wise and Hoffman 1992; Wise et al. 1978).
signal. A cooperation between the different reward signals may
The present article summarizes recent research concerning
assure the use of specific rewards for selectively reinforcing behav-
the signaling of environmental motivating stimuli by dopa-
iors. Among the other projection systems, noradrenaline neurons
mine neurons and evaluates the potential functions of these
predominantly serve attentional mechanisms and nucleus basalis
signals for modifying behavioral reactions by reference to
neurons code rewards heterogeneously. Cerebellar climbing fibers
anatomic organization, learning theories, artificial neuronal
signal errors in motor performance or errors in the prediction of
models, other neuronal systems, and deficits after lesions.
aversive events to cerebellar Purkinje cells. Most deficits following
All known response characteristics of dopamine neurons will
dopamine-depleting lesions are not easily explained by a defective
be described, but predominantly the responses to reward-
reward signal but may reflect the absence of a general enabling
function of tonic levels of extracellular dopamine. Thus dopamine
related stimuli will be conceptualized because they are the
systems may have two functions, the phasic transmission of reward
best understood presently. Because of the large amount of
information and the tonic enabling of postsynaptic neurons.
data available in the literature, the principal system discussed
will be the nigrostriatal dopamine projection, but projections
from midbrain dopamine neurons to ventral striatum and
INTRODUCTION
frontal cortex also will be considered as far as the present
knowledge allows.When multicellular organisms arose through the evolution
of self-reproducing molecules, they developed endogenous,
REWARDS AND PREDICTIONS
autoregulatory mechanisms assuring that their needs for wel-
Functions of rewards
fare and survival were met. Subjects engage in various forms
of approach behavior to obtain resources for maintaining Certain objects and events in the environment are of par-
ticular motivational significance by their effects on welfare,homeostatic balance and to reproduce. One class of resources
10022-3077/98 $5.00 Copyright
q
1998 The American Physiological Society
J857-7/ 9k2a$$jy19 06-22-98 13:43:40 neupa LP-Neurophys

W. SCHULTZ2
survival, and reproduction. According to the behavioral reac-
tions elicited, the motivational value of environmental ob-
jects can be appetitive (rewarding) or aversive (punishing).
(Note that ‘‘appetitive’’ is used synonymous for ‘‘re-
warding’’ but not for ‘‘preparatory.’’) Appetitive objects
have three separable basic functions. In their first function,
rewards elicit approach and consummatory behavior. This
is due to the objects being labeled with appetitive value
FIG
. 1. Processing of appetitive stimuli during learning. An arbitrary
through innate mechanisms or, in most cases, learning. In
stimulus becomes associated with a primary food or liquid reward through
their second function, rewards increase the frequency and
repeated, contingent pairing. This conditioned, reward-predicting stimulus
intensity of behavior leading to such objects (learning), and
induces an internal motivational state by evoking an expectation of the
they maintain learned behavior by preventing extinction. Re-
reward, often on the basis of a corresponding hunger or thirst drive, and
wards serve as positive reinforcers of behavior in classical
elicits the behavioral reaction. This scheme replicates basic notions of incen-
tive motivation theory developed by Bindra (1968) and Bolles (1972). It
and instrumental conditioning procedures. In general incen-
applies to classical conditioning, where reward is automatically delivered
tive learning, environmental stimuli acquire appetitive value
after the conditioned stimulus, and to instrumental (operant) conditioning,
following classically conditioned stimulus-reward associa-
where reward delivery requires a reaction by the subject to the conditioned
tions and induce approach behavior (Bindra 1968). In instru-
stimulus. This scheme applies also to aversive conditioning which is not
further elaborated for reasons of brevity.
mental conditioning, rewards ‘‘reinforce’’ behaviors by
strengthening associations between stimuli and behavioral
responses (Law of Effect: Thorndike 1911). This is the
to predict and react to system states before they actually
essence of ‘‘coming back for more’’ and is related to the
occur (Garcia et al. 1989). For example, the ‘‘fly-by-wire’’
common notion of rewards being obtained for having done
technique in modern aviation computes predictable forth-
something well. In an instrumental form of incentive learn-
coming states of airplanes. Decisions for flying maneuvers
ing, rewards are ‘‘incentives’’ and serve as goals of behavior
take this information into account and help to avoid exces-
following associations between behavioral responses and
sive strain on the mechanical components of the plane, thus
outcomes (Dickinson and Balleine 1994). In their third func-
reducing weight and increasing the range of operation.
tion, rewards induce subjective feelings of pleasure (hedo-
The use of predictive information depends on the nature
nia) and positive emotional states. Aversive stimuli function
of the represented future events or system states. Simple
in opposite directions. They induce withdrawal responses
representations directly concern the position of upcoming
and act as negative reinforcers by increasing and maintaining
targets and the ensuing behavioral reaction, thus reducing
avoidance behavior on repeated presentation, thereby reduc-
reaction time in a rather automatic fashion. Higher forms of
ing the impact of damaging events. Furthermore they induce
predictions are based on representations permitting logical
internal emotional states of anger, fear, and panic.
inference, which can be accessed and treated with varying
degrees of intentionality and choice. They often are pro-
cessed consciously in humans. Before the predicted events
Functions of predictions
or system states occur and behavioral reactions are carried
Predictions provide advance information about future
out, such predictions allow one to mentally evaluate various
stimuli, events, or system states. They provide the basic
strategies by integrating knowledge from different sources,
advantage of gaining time for behavioral reactions. Some
designing various ways of reaction and comparing the gains
forms of predictions attribute motivational values to environ-
and losses from each possible reaction.
mental stimuli by association with particular outcomes, thus
identifying objects of vital importance and discriminating
Behavioral conditioning
them from less valuable objects. Other forms code physical
parameters of predicted objects, such as spatial position, Associative appetitive learning involves the repeated and
contingent pairing between an arbitrary stimulus and a pri-velocity, and weight. Predictions allow an organism to evalu-
ate future events before they actually occur, permit the selec- mary reward (Fig. 1). This results in increasingly frequent
approach behavior induced by the now ‘‘conditioned’’ stim-tion and preparation of behavioral reactions, and increase
the likelihood of approaching or avoiding objects labeled ulus, which partly resembles the approach behavior elicited
by the primary reward and also is influenced by the naturewith motivational values. For example, repeated movements
of objects in the same sequence allow one to predict forth- of the conditioned stimulus. It appears that the conditioned
stimulus serves as a predictor of reward and, often on thecoming positions and already prepare the next movement
while pursuing the present object. This reduces reaction time basis of an appropriate drive, sets an internal motivational
state leading to the behavioral reaction. The similarity ofbetween individual targets, speeds up overall performance,
and results in an earlier outcome. Predictive eye movements approach reactions suggests that some of the general, prepa-
ratory components of the behavioral response are transferredameliorate behavioral performance through advance focus-
ing (Flowers and Downing 1978). from the primary reward to the earliest conditioned, reward-
predicting stimulus. Thus the conditioned stimulus actsAt a more advanced level, the advance information pro-
vided by predictions allows one to make decisions between partly as a motivational substitute for the primary stimulus,
probably through Pavlovian learning (Dickinson 1980).alternatives to attain particular system states, approach infre-
quently occurring goal objects, or avoid irreparable adverse Many so called ‘‘unconditioned’’ food and liquid rewards
are probably learned through experience, as every visitor toeffects. Industrial applications use Internal Model Control
J857-7/ 9k2a$$jy19 06-22-98 13:43:40 neupa LP-Neurophys

PREDICTIVE DOPAMINE REWARD SIGNAL 3
foreign countries can confirm. The primary reward then Activation by primary appetitive stimuli
might consist of the taste experienced when the object acti-
About 75% of dopamine neurons show phasic activations
vates the gustatory receptors, but that again may be learned.
when animals touch a small morsel of hidden food during
The ultimate rewarding effect of nutrient objects probably
exploratory movements in the absence of other phasic stim-
consists in their specific influences on basic biological vari-
uli, without being activated by the movement itself (Romo
ables, such as electrolyte, glucose, or amino acid concentra-
and Schultz 1990). The remaining dopamine neurons do not
tions in plasma and brain. These variables are defined by
respond to any of the tested environmental stimuli. Dopa-
the vegetative needs of the organism and arise through evolu-
mine neurons also are activated by a drop of liquid delivered
tion. Animals avoid nutrients that fail to influence important
at the mouth outside of any behavioral task or while learning
vegetative variables, for example foods lacking such essen-
such different paradigms as visual or auditory reaction time
tial amino acids as histidine (Rogers and Harper 1970),
tasks, spatial delayed response or alternation, and visual dis-
threonine (Hrupka et al. 1997; Wang et al. 1996), or methio-
crimination, often in the same animal (Fig. 2 top) (Hol-
nine (Delaney and Gelperin 1986). A few primary rewards
lerman and Schultz 1996; Ljungberg et al. 1991, 1992; Mire-
may be determined by innate instincts and support initial
nowicz and Schultz 1994; Schultz et al. 1993). The reward
approach behavior and ingestion in early life, whereas the
responses occur independently of a learning context. Thus
majority of rewards would be learned during the subsequent
dopamine neurons do not appear to discriminate between
life experience of the subject. The physical appearance of
different food objects and liquid rewards. However, their
rewards then could be used for predicting the much slower
responses distinguish rewards from nonreward objects
vegetative effects. This would dramatically accelerate the
(Romo and Schultz 1990). Only 14% of dopamine neurons
detection of rewards and constitute a major advantage for
show the phasic activations when primary aversive stimuli
survival. Learning of rewards also allows subjects to use a
are presented, such as an air puff to the hand or hypertonic
much larger variety of nutrients as effective rewards and
saline to the mouth, and most of the activated neurons re-
thus increase their chance for survival in zones of scarce
spond also to rewards (Mirenowicz and Schultz 1996). Al-
resources.
though being nonnoxious, these stimuli are aversive in that
they disrupt behavior and induce active avoidance reactions.
However, dopamine neurons are not entirely insensitive to
ADAPTIVE RESPONSES TO APPETITIVE STIMULI
aversive stimuli, as shown by slow depressions or occasional
slow activations after pain pinch stimuli in anesthetized mon-
Cell bodies of dopamine neurons are located mostly in
keys (Schultz and Romo 1987) and by increased striatal
midbrain groups A8 (dorsal to lateral substantia nigra), A9
dopamine release after electric shock and tail pinch in awake
(pars compacta of substantia nigra), and A10 (ventral teg-
rats (Abercrombie et al. 1989; Doherty and Gratton 1992;
mental area medial to substantia nigra). These neurons re-
Louilot et al. 1986; Young et al. 1993). This suggests that the
lease the neurotransmitter dopamine with nerve impulses
phasic responses of dopamine neurons preferentially report
from axonal varicosities in the striatum (caudate nucleus,
environmental stimuli with primary appetitive value,
putamen, and ventral striatum including nucleus accumbens)
whereas aversive events may be signaled with a considerably
and frontal cortex, to name the most important sites. We
slower time course.
record the impulse activity from cell bodies of single dopa-
mine neurons during periods of 2060 min with moveable
Unpredictability of reward
microelectrodes from extracellular positions while monkeys
learn or perform behavioral tasks. The characteristic
An important feature of dopamine responses is their de-
polyphasic, relatively long impulses discharged at low fre-
pendency on event unpredictability. The activations follow-
quencies make dopamine neurons easily distinguishable
ing rewards do not occur when food and liquid rewards are
from other midbrain neurons. The employed behavioral para-
preceded by phasic stimuli that have been conditioned to
digms include reaction time tasks, direct and delayed
GO
-
NO
predict such rewards (Fig. 2, middle) (Ljungberg et al. 1992;
GO
tasks, spatial delayed response and alternation tasks, air
Mirenowicz and Schultz 1994; Romo and Schultz 1990).
puff and saline active avoidance tasks, operant and classi-
One crucial difference between learning and fully acquired
cally conditioned visual discrimination tasks, self-initiated
behavior is the degree of reward unpredictability. Dopamine
movements, and unpredicted delivery of reward in the ab-
neurons are activated by rewards during the learning phase
sence of any formal task. About 100250 dopamine neurons
but stop responding after full acquisition of visual and audi-
are studied in each behavioral situation, and fractions of
tory reaction time tasks (Ljungberg et al. 1992; Mirenowicz
task-modulated neurons refer to these samples.
and Schultz 1994), spatial delayed response tasks (Schultz
Initial recording studies searched for correlates of parkin-
et al. 1993), and simultaneous visual discriminations (Hol-
sonian motor and cognitive deficits in dopamine neurons but
lerman and Schultz 1996). The loss of response is not due to
failed to find clear covariations with arm and eye movements
a developing general insensitivity to rewards, as activations
(DeLong et al. 1983; Schultz and Romo 1990; Schultz et al.
following rewards delivered outside of tasks do not decre-
1983) or with mnemonic or spatial components of delayed
ment during several months of experimentation (Mirenowicz
response tasks (Schultz et al. 1993). By contrast, it was
and Schultz 1994). The importance of unpredictability in-
found that dopamine neurons were activated in a very dis-
cludes the time of reward, as demonstrated by transient acti-
tinctive manner by the rewarding characteristics of a wide
vations following rewards that are suddenly delivered earlier
or later than predicted (Hollerman and Schultz 1996). Takenrange of somatosensory, visual, and auditory stimuli.
J857-7/ 9k2a$$jy19 06-22-98 13:43:40 neupa LP-Neurophys

W. SCHULTZ4
fails to occur, even in the absence of an immediately preced-
ing stimulus (Fig. 2, bottom). This is observed when animals
fail to obtain reward because of erroneous behavior, when
liquid flow is stopped by the experimenter despite correct
behavior, or when a valve opens audibly without delivering
liquid (Hollerman and Schultz 1996; Ljungberg et al. 1991;
Schultz et al. 1993). When reward delivery is delayed for
0.5 or 1.0 s, a depression of neuronal activity occurs at the
regular time of the reward, and an activation follows the
reward at the new time (Hollerman and Schultz 1996). Both
responses occur only during a few repetitions until the new
time of reward delivery becomes predicted again. By con-
trast, delivering reward earlier than habitual results in an
activation at the new time of reward but fails to induce a
depression at the habitual time. This suggests that unusually
early reward delivery cancels the reward prediction for the
habitual time. Thus dopamine neurons monitor both the oc-
currence and the time of reward. In the absence of stimuli
immediately preceding the omitted reward, the depressions
do not constitute a simple neuronal response but reflect an
expectation process based on an internal clock tracking the
precise time of predicted reward.
Activation by conditioned, reward-predicting stimuli
About 5570% of dopamine neurons are activated by
conditioned visual and auditory stimuli in the various classi-
cally or instrumentally conditioned tasks described earlier
(Fig. 2, middle and bottom) (Hollerman and Schultz 1996;
Ljungberg et al. 1991, 1992; Mirenowicz and Schultz 1994;
Schultz 1986; Schultz and Romo 1990; P. Waelti, J. Mire-
nowicz, and W. Schultz, unpublished data). The first dopa-
mine responses to conditioned light were reported by Miller
et al. (1981) in rats treated with haloperidol, which increased
the incidence and spontaneous activity of dopamine neurons
but resulted in more sustained responses than in undrugged
animals. Although responses occur close to behavioral reac-
tions (Nishino et al. 1987), they are unrelated to arm and
eye movements themselves, as they occur also ipsilateral to
FIG
. 2. Dopamine neurons report rewards according to an error in re-
ward prediction. Top: drop of liquid occurs although no reward is predicted
the moving arm and in trials without arm or eye movements
at this time. Occurrence of reward thus constitutes a positive error in the
(Schultz and Romo 1990). Conditioned stimuli are some-
prediction of reward. Dopamine neuron is activated by the unpredicted
what less effective than primary rewards in terms of response
occurrence of the liquid. Middle: conditioned stimulus predicts a reward,
magnitude and fractions of neurons activated. Dopamine
and the reward occurs according to the prediction, hence no error in the
prediction of reward. Dopamine neuron fails to be activated by the predicted
neurons respond only to the onset of conditioned stimuli and
reward (right). It also shows an activation after the reward-predicting stim-
not to their offset, even if stimulus offset predicts the reward
ulus, which occurs irrespective of an error in the prediction of the later
(Schultz and Romo 1990). Dopamine neurons do not distin-
reward (left). Bottom: conditioned stimulus predicts a reward, but the re-
guish between visual and auditory modalities of conditioned
ward fails to occur because of lack of reaction by the animal. Activity of
appetitive stimuli. However, they discriminate between ap-
the dopamine neuron is depressed exactly at the time when the reward
would have occurred. Note the depression occurring
ú
1 s after the condi-
petitive and neutral or aversive stimuli as long as they are
tioned stimulus without any intervening stimuli, revealing an internal pro-
physically sufficiently dissimilar (Ljungberg et al. 1992;
cess of reward expectation. Neuronal activity in the 3 graphs follows the
P. Waelti, J. Mirenowicz, and W. Schultz, unpublished
equation: dopamine response (Reward)
Å
reward occurred
0
reward pre-
data). Only 11% of dopamine neurons, most of them with
dicted. CS, conditioned stimulus; R, primary reward. Reprinted from
Schultz et al. (1997) with permission by American Association for the
appetitive responses, show the typical phasic activations also
Advancement of Science.
in response to conditioned aversive visual or auditory stimuli
in active avoidance tasks in which animals release a key to
avoid an air puff or a drop of hypertonic saline (Mirenowicz
together, the occurrence of reward, including its time, must
and Schultz 1996), although such avoidance may be viewed
be unpredicted to activate dopamine neurons.
as ‘‘rewarding.’’ These few activations are not sufficiently
strong to induce an average population response. Thus the
Depression by omission of predicted reward
phasic responses of dopamine neurons preferentially report
environmental stimuli with appetitive motivational value butDopamine neurons are depressed exactly at the time of
the usual occurrence of reward when a fully predicted reward without discriminating between different sensory modalities.
J857-7/ 9k2a$$jy19 06-22-98 13:43:40 neupa LP-Neurophys

PREDICTIVE DOPAMINE REWARD SIGNAL 5
(Ljungberg et al. 1992). This suggests that stimulus unpre-
dictability is a common requirement for all stimuli activating
dopamine neurons.
Depression by omission of predicted conditioned stimuli
Preliminary data from a previous experiment (Schultz et
al. 1993) suggest that dopamine neurons also are depressed
when a conditioned, reward-predicting stimulus is predicted
itself at a fixed time by a preceding stimulus but fails to
occur because of an error of the animal. As with primary
rewards, the depressions occur at the time of the usual occur-
rence of the conditioned stimulus, without being directly
elicited by a preceding stimulus. Thus the omission-induced
depression may be generalized to all appetitive events.
Activation-depression with response generalization
FIG
. 3. Dopamine response transfer to earliest predictive stimulus. Re-
sponses to unpredicted primary reward transfer to progressively earlier
Dopamine neurons also respond to stimuli that do not
reward-predicting stimuli. All displays show population histograms ob-
predict rewards but closely resemble reward-predicting stim-
tained by averaging normalized perievent time histograms of all dopamine
uli occurring in the same context. These responses consist
neurons recorded in the behavioral situations indicated, independent of the
presence or absence of a response. Top: outside of any behavioral task,
mostly of an activation followed by an immediate depression
there is no population response in 44 neurons tested with a small light (data
but may occasionally consist of pure activation or pure de-
from Ljungberg et al. 1992), but an average response occurs in 35 neurons
pression. The activations are smaller and less frequent than
to a drop of liquid delivered at a spout in front of the animal’s mouth
those following reward-predicting stimuli, and the depres-
(Mirenowicz and Schultz 1994). Middle: response to a reward-predicting
trigger stimulus in a 2-choice spatial reaching task, but absence of response
sions are observed in 3060% of neurons. Dopamine neu-
to reward delivered during established task performance in the same 23
rons respond to visual stimuli that are not followed by reward
neurons (Schultz et al. 1993). Bottom: response to an instruction cue pre-
but closely resemble reward-predicting stimuli, despite cor-
ceding the reward-predicting trigger stimulus by a fixed interval of 1 s in
rect behavioral discrimination (Schultz and Romo 1990).
an instructed spatial reaching task (19 neurons) (Schultz et al. 1993). Time
Opening of an empty box fails to activate dopamine neurons
base is split because of varying intervals between conditioned stimuli and
reward. Reprinted from Schultz et al. (1995b) with permission by MIT
but becomes effective in every trial as soon as the box occa-
Press.
sionally contains food (Ljungberg et al. 1992; Schultz 1986;
Schultz and Romo 1990) or when a neighboring, identical
box always containing food opens in random alternationTransfer of activation
(Schultz and Romo 1990). The empty box elicits weaker
During the course of learning, dopamine neurons become
activations than the baited box. Animals perform indiscrimi-
gradually activated by conditioned, reward-predicting stim-
nate ocular orienting reactions to each box but only approach
uli and progressively lose their responses to primary food
the baited box with their hand. During learning, dopamine
or liquid rewards that become predicted (Hollerman and
neurons continue to respond to previously conditioned stim-
Schultz 1996; Ljungberg et al. 1992; Mirenowicz and
uli that lose their reward prediction when reward contingen-
Schultz 1994) (Figs. 2 and 3). During a transient learning
cies change (Schultz et al. 1993) or respond to new stimuli
period, both rewards and conditioned stimuli elicit dopamine
resembling previously conditioned stimuli (Hollerman and
activations. This transfer from primary reward to the condi-
Schultz 1996). Responses occur even to aversive stimuli
tioned stimulus occurs instantaneously in single dopamine
presented in random alternation with physically similar, con-
neurons tested in two well-learned tasks employing, respec-
ditioned appetitive stimuli of the same sensory modality,
tively, unpredicted and predicted rewards (Romo and
the aversive response being weaker than the appetitive one
Schultz 1990).
(Mirenowicz and Schultz 1996). Responses generalize even
to behaviorally extinguished appetitive stimuli. Apparently,
neuronal responses generalize to nonappetitive stimuli be-Unpredictability of conditioned stimuli
cause of their physical resemblance with appetitive stimuli.
The activations after conditioned, reward-predicting stim-
uli do not occur when these stimuli themselves are preceded
Novelty responses
at a fixed interval by phasic conditioned stimuli in fully
established behavioral situations. Thus with serial condi- Novel stimuli elicit activations in dopamine neurons that
often are followed by depressions and persist as long astioned stimuli, dopamine neurons are activated by the earliest
reward-predicting stimulus, whereas all stimuli and rewards behavioral orienting reactions occur (e.g., ocular saccades).
Activations subside together with orienting reactions afterfollowing at predictable moments afterwards are ineffective
(Fig. 3) (Schultz et al. 1993). Only randomly spaced se- several stimulus repetitions, depending on the physical im-
pact of stimuli. Whereas small light-emitting diodes hardlyquential stimuli elicit individual responses. Also, extensive
overtraining with highly stereotyped task performance atten- elicit novelty responses, light flashes and the rapid visual
and auditory opening of a small box elicit activations thatuates the responses to conditioned stimuli, probably because
stimuli become predicted by events in the preceding trial decay gradually to baseline during
õ
100 trials (Ljungberg
J857-7/ 9k2a$$jy19 06-22-98 13:43:40 neupa LP-Neurophys

Figures
Citations
More filters
Journal ArticleDOI

An integrative theory of prefrontal cortex function

TL;DR: It is proposed that cognitive control stems from the active maintenance of patterns of activity in the prefrontal cortex that represent goals and the means to achieve them, which provide bias signals to other brain structures whose net effect is to guide the flow of activity along neural pathways that establish the proper mappings between inputs, internal states, and outputs needed to perform a given task.
Journal ArticleDOI

The adolescent brain and age-related behavioral manifestations

TL;DR: Developmental changes in prefrontal cortex and limbic brain regions of adolescents across a variety of species, alterations that include an apparent shift in the balance between mesocortical and mesolimbic dopamine systems likely contribute to the unique characteristics of adolescence.
Journal ArticleDOI

The free-energy principle: a unified brain theory?

TL;DR: This Review looks at some key brain theories in the biological and physical sciences from the free-energy perspective, suggesting that several global brain theories might be unified within a free- energy framework.
Journal ArticleDOI

The neural basis of human error processing: Reinforcement learning, dopamine, and the error-related negativity.

TL;DR: This paper presented a unified account of two neural systems concerned with the development and expression of adaptive behaviors: a mesencephalic dopamine system for reinforcement learning and a generic error-processing system associated with the anterior cingulate cortex.
Journal ArticleDOI

The Reorienting System of the Human Brain: From Environment to Theory of Mind

TL;DR: While originally conceptualized as a system for redirecting attention from one object to another, recent evidence suggests a more general role in switching between networks, which may explain recent evidence of its involvement in functions such as social cognition.
References
More filters
Book ChapterDOI

Learning internal representations by error propagation

TL;DR: This chapter contains sections titled: The Problem, The Generalized Delta Rule, Simulation Results, Some Further Generalizations, Conclusion.
Book

Learning internal representations by error propagation

TL;DR: In this paper, the problem of the generalized delta rule is discussed and the Generalized Delta Rule is applied to the simulation results of simulation results in terms of the generalized delta rule.
Journal ArticleDOI

A Neural Substrate of Prediction and Reward

TL;DR: Findings in this work indicate that dopaminergic neurons in the primate whose fluctuating output apparently signals changes or errors in the predictions of future salient and rewarding events can be understood through quantitative theories of adaptive optimizing control.
Journal ArticleDOI

Parallel Organization of Functionally Segregated Circuits Linking Basal Ganglia and Cortex

TL;DR: The basal ganglia serve primarily to integrate diverse inputs from the entire cerebral cortex and to "funnel" these influences, via the ventrolateral thalamus, to the motor cortex.
Related Papers (5)
Frequently Asked Questions (6)
Q1. What is the significance of aversive prediction errors?

The importanceof aversive prediction errors may involve descending inhibi-tory inputs to inferior olive neurons, in analogy to striatal of ambient dopamine concentrations is demonstrated experimentally by the deleterious effects of unphysiologic levelsprojections to dopamine neurons. 

About 100–250 dopamine neurons but stop responding after full acquisition of visual and audiare studied in each behavioral situation, and fractions of tory reaction time tasks (Ljungberg et al. 

When reward delivery is delayed for 0.5 or 1.0 s, a depression of neuronal activity occurs at the regular time of the reward, and an activation follows the reward at the new time (Hollerman and Schultz 1996). 

The critic-actor architecture is particularly attractive for neurobiology because of its separate teaching and performance modules. 

The monosynaptic projection from dorsal responses, without ruling out inputs from globus pallidus and subthalamic nucleus.raphé (Corvaja et al. 1993; Nedergaard et al. 1988) has adepressant influence on dopamine neurons (Fibiger et al. 

The specific conditions in which phasic dopamine signalscould play a role in learning are determined by the kinds ofDegeneration of dopamine neurons in Parkinson’s disease also leads to a number of declarative and procedural learning stimuli that effectively induce a dopamine response.