scispace - formally typeset
Book ChapterDOI

Understanding the role of serotonin in basal ganglia through a unified model

11 Sep 2012-pp 467-473
TL;DR: A Reinforcement Learning (RL)-based model of serotonin which tries to reconcile some of the diverse roles of the neuromodulator is presented, which uses a novel formulation of utility function, which is a weighted sum of the traditional value function and the risk function.
Abstract: We present a Reinforcement Learning (RL)-based model of serotonin which tries to reconcile some of the diverse roles of the neuromodulator. The proposed model uses a novel formulation of utility function, which is a weighted sum of the traditional value function and the risk function. Serotonin is represented by the weightage, α, used in this combination. The model is applied to three different experimental paradigms: 1) bee foraging behavior, which involves decision making based on risk, 2) temporal reward prediction task, in which serotonin (α) controls the time-scale of reward prediction, and 3) reward/punishment prediction task, in which punishment prediction error depends on serotonin levels. The three diverse roles of serotonin --- in time-scale of reward prediction, risk modeling, and punishment prediction --- is explained within a single framework by the model.
Citations
More filters

Journal ArticleDOI
TL;DR: The model is the first model of PG in PD conditions modeled using Reinforcement Learning with the significant difference that the action selection is performed using utility distribution instead of using purely Value-based distribution, thereby incorporating risk-based decision making.
Abstract: We propose a computational model of Precision Grip (PG) performance in normal subjects and Parkinson’s Disease (PD) patients. Prior studies on grip force generation in PD patients show an increase in grip force during ON medication and an increase in the variability of the grip force during OFF medication (Fellows et al 1998; Ingvarsson et al 1997). Changes in grip force generation in dopamine-deficient PD conditions strongly suggest contribution of the Basal Ganglia, a deep brain system having a crucial role in translating dopamine signals to decision making. The present approach is to treat the problem of modeling grip force generation as a problem of action selection, which is one of the key functions of the Basal Ganglia. The model consists of two components: 1) the sensory-motor loop component, and 2) the Basal Ganglia component. The sensory-motor loop component converts a reference position and a reference grip force, into lift force and grip force profiles, respectively. These two forces cooperate in grip-lifting a load. The sensory-motor loop component also includes a plant model that represents the interaction between two fingers involved in PG, and the object to be lifted. The Basal Ganglia component is modeled using Reinforcement Learning with the significant difference that the action selection is performed using utility distribution instead of using purely Value-based distribution, thereby incorporating risk-based decision making. The proposed model is able to account for the precision grip results from normal and PD patients accurately (Fellows et. al. 1998; Ingvarsson et. al. 1997). To our knowledge the model is the first model of precision grip in PD conditions.

21 citations


Book ChapterDOI
01 Jan 2018-
TL;DR: This chapter argues that describing the two BG pathways as having mutually opponent actions has limitations and argues that the BG indirect pathway also plays a role in exploration, which is used to simulate various processes of the basal ganglia.
Abstract: One of the earliest attempts at building a theory of the basal ganglia (BG) is based on the clinical findings that lesions to the direct and indirect pathways of the BG produce quite opposite motor manifestations (Albin et al., in Trends Neurosci 12(10):366–375, 1989). While lesions of the direct pathway (DP), affecting particularly the projections from the striatum to GPi, are associated with hypokinetic disorders (distinguished by a paucity of movement), lesions of the indirect pathway (IP) produce hyperkinetic disorders, such as chorea and tremor. In this chapter, we argue that describing the two BG pathways as having mutually opponent actions has limitations. We argue that the BG indirect pathway also plays a role in exploration. We should evidence from various motor learning and decision-making tasks that exploration is a necessary process in various behavioral processes. Importantly, we use the exploration mechanism explained here to simulate various processes of the basal ganglia which we discuss in the following chapters.

12 citations


Book ChapterDOI
01 Jan 2018-
TL;DR: This chapter presents an extended reinforcement learning (RL)-based model of DA and 5-HT function in the BG, which reconciles some of the diverse roles of 5- HT.
Abstract: In addition to dopaminergic input, serotonergic (5-HT) fibers also widely arborize through the basal ganglia circuits and strongly control their dynamics. Although empirical studies show that 5-HT plays many functional roles in risk-based decision making, reward, and punishment learning, prior computational models mostly focus on its role in behavioral inhibition or timescale of prediction. This chapter presents an extended reinforcement learning (RL)-based model of DA and 5-HT function in the BG, which reconciles some of the diverse roles of 5-HT. The model uses the concept of utility function—a weighted sum of the traditional value function expressing the expected sum of the rewards, and a risk function expressing the variance observed in reward outcomes. Serotonin is represented by a weight parameter, used in this combination of value and risk functions, while the neuromodulator dopamine (DA) is represented as reward prediction error as in the classical models. Consistent with this abstract model, a network model is also presented in which medium spiny neurons (MSN) co-expressing both D1 and D2 receptors (D1R–D2R) is suggested to compute risk, while those expressing only D1 receptors are suggested to compute value. This BG model includes nuclei such as striatum, Globus Pallidus externa, Globus Pallidus interna, and subthalamic nuclei. DA and 5-HT are modeled to affect both the direct pathway (DP) and the indirect pathway (IP) composing of D1R, D2R, D1R–D2R projections differentially. Both abstract and network models are applied to data from different experimental paradigms used to study the role of 5-HT: (1) risk-sensitive decision making, where 5-HT controls the risk sensitivity; (2) temporal reward prediction, where 5-HT controls timescale of reward prediction, and (3) reward–punishment sensitivity, where punishment prediction error depends on 5-HT levels. Both the extended RL model (Balasubramani, Chakravarthy, Ravindran, & Moustafa, in Front Comput Neurosci 8:47, 2014; Balasubramani, Ravindran, & Chakravarthy, in Understanding the role of serotonin in basal ganglia through a unified model, 2012) along with their network correlates (Balasubramani, Chakravarthy, Ravindran, & Moustafa, in Front Comput Neurosci 9:76, 2015; Balasubramani, Chakravarthy, Ali, Ravindran, & Moustafa, in PLoS ONE 10(6):e0127542, 2015) successfully explain the three diverse roles of 5-HT in a single framework.

5 citations


Journal ArticleDOI
12 Aug 2021-
TL;DR: This model involves two critics, an optimistic learning system and a pessimistic learning system, whose predictions are integrated in time to control how potential decisions compete to be selected, and predicts that human decision-making can be decomposed along two dimensions.
Abstract: Recent experiments and theories of human decision-making suggest positive and negative errors are processed and encoded differently by serotonin and dopamine, with serotonin possibly serving to oppose dopamine and protect against risky decisions. We introduce a temporal difference (TD) model of human decision-making to account for these features. Our model involves two critics, an optimistic learning system and a pessimistic learning system, whose predictions are integrated in time to control how potential decisions compete to be selected. Our model predicts that human decision-making can be decomposed along two dimensions: the degree to which the individual is sensitive to (1) risk and (2) uncertainty. In addition, we demonstrate that the model can learn about the mean and standard deviation of rewards, and provide information about reaction time despite not modeling these variables directly. Lastly, we simulate a recent experiment to show how updates of the two learning systems could relate to dopamine and serotonin transients, thereby providing a mathematical formalism to serotonin’s hypothesized role as an opponent to dopamine. This new model should be useful for future experiments on human decision-making.

1 citations


Book ChapterDOI
01 Jan 2018-
TL;DR: A Go/Explore/NoGo (GEN) algorithm in a utility-based decision-making framework is presented to explain the SM generated by healthy controls and PD patients both during ON and OFF medication.
Abstract: Precision grip (PG) is the ability to hold an object between forefinger and thumb. Lifting objects in PG require delicate finger grip force (GF) control. Healthy controls modulate GF depending on size, weight, surface curvature, and friction. The difference between the actual GF generated and the minimum GF required to prevent the object from slipping is known as safety margin (SM). Published results suggest that OFF-medicated Parkinson’s disease (PD) patients generated average SM identical to that of controls with increased SM variance. PD patients on medication demonstrated higher average SM with SM variance identical to that of controls. Previously known computational models provide an insight on how the GF is generated and controlled but are unsuitable for modeling the GF in PD patients. In this chapter, we present a Go/Explore/NoGo (GEN) algorithm in a utility-based decision-making framework to explain the SM generated by healthy controls and PD patients both during ON and OFF medication. The study suggests that PD GF is a result of dopamine-level-dependent suboptimal decision-making-based force selection and the suitability of the GEN algorithm to model decision-making tasks.

1 citations


References
More filters

Book ChapterDOI
01 Mar 1979-Econometrica
Abstract: This paper presents a critique of expected utility theory as a descriptive model of decision making under risk, and develops an alternative model, called prospect theory. Choices among risky prospects exhibit several pervasive effects that are inconsistent with the basic tenets of utility theory. In particular, people underweight outcomes that are merely probable in comparison with outcomes that are obtained with certainty. This tendency, called the certainty effect, contributes to risk aversion in choices involving sure gains and to risk seeking in choices involving sure losses. In addition, people generally discard components that are shared by all prospects under consideration. This tendency, called the isolation effect, leads to inconsistent preferences when the same choice is presented in different forms. An alternative theory of choice is developed, in which value is assigned to gains and losses rather than to final assets and in which probabilities are replaced by decision weights. The value function is normally concave for gains, commonly convex for losses, and is generally steeper for losses than for gains. Decision weights are generally lower than the corresponding probabilities, except in the range of low prob- abilities. Overweighting of low probabilities may contribute to the attractiveness of both insurance and gambling. EXPECTED UTILITY THEORY has dominated the analysis of decision making under risk. It has been generally accepted as a normative model of rational choice (24), and widely applied as a descriptive model of economic behavior, e.g. (15, 4). Thus, it is assumed that all reasonable people would wish to obey the axioms of the theory (47, 36), and that most people actually do, most of the time. The present paper describes several classes of choice problems in which preferences systematically violate the axioms of expected utility theory. In the light of these observations we argue that utility theory, as it is commonly interpreted and applied, is not an adequate descriptive model and we propose an alternative account of choice under risk. 2. CRITIQUE

34,961 citations


5


Book
01 Jan 1988-
TL;DR: This book provides a clear and simple account of the key ideas and algorithms of reinforcement learning, which ranges from the history of the field's intellectual foundations to the most recent developments and applications.
Abstract: Reinforcement learning, one of the most active research areas in artificial intelligence, is a computational approach to learning whereby an agent tries to maximize the total amount of reward it receives when interacting with a complex, uncertain environment. In Reinforcement Learning, Richard Sutton and Andrew Barto provide a clear and simple account of the key ideas and algorithms of reinforcement learning. Their discussion ranges from the history of the field's intellectual foundations to the most recent developments and applications. The only necessary mathematical background is familiarity with elementary concepts of probability. The book is divided into three parts. Part I defines the reinforcement learning problem in terms of Markov decision processes. Part II provides basic solution methods: dynamic programming, Monte Carlo methods, and temporal-difference learning. Part III presents a unified view of the solution methods and incorporates artificial neural networks, eligibility traces, and planning; the two final chapters present case studies and consider the future of reinforcement learning.

32,257 citations


"Understanding the role of serotonin..." refers background in this paper

  • ...In classical RL [13] terms, the value function may be expressed as,...

    [...]


Journal ArticleDOI
01 Jan 1979-Econometrica

24,566 citations


Journal ArticleDOI
14 Mar 1997-Science
TL;DR: Findings in this work indicate that dopaminergic neurons in the primate whose fluctuating output apparently signals changes or errors in the predictions of future salient and rewarding events can be understood through quantitative theories of adaptive optimizing control.
Abstract: The capacity to predict future events permits a creature to detect, model, and manipulate the causal structure of its interactions with its environment. Behavioral experiments suggest that learning is driven by changes in the expectations about future salient events such as rewards and punishments. Physiological work has recently complemented these studies by identifying dopaminergic neurons in the primate whose fluctuating output apparently signals changes or errors in the predictions of future salient and rewarding events. Taken together, these findings can be understood through quantitative theories of adaptive optimizing control.

7,378 citations


1


"Understanding the role of serotonin..." refers background in this paper

  • ...But a more subtle observation that activity of mesencephalic dopamine neurons closely resembles an important variable in Reinforcement Learning (RL), paved way to application of RL concepts to dopamine signaling and even basal ganglia function [1]....

    [...]

  • ...Dopaminergic activity has been linked to reward processing in the brain for a long time [1]....

    [...]


Book
01 Mar 1998-
TL;DR: In Reinforcement Learning, Richard Sutton and Andrew Barto provide a clear and simple account of the key ideas and algorithms of reinforcement learning.
Abstract: From the Publisher: In Reinforcement Learning, Richard Sutton and Andrew Barto provide a clear and simple account of the key ideas and algorithms of reinforcement learning. Their discussion ranges from the history of the field's intellectual foundations to the most recent developments and applications. The only necessary mathematical background is familiarity with elementary concepts of probability.

7,013 citations


Performance
Metrics
No. of citations received by the Paper in previous years
YearCitations
20211
20202
20183
20141
20131