scispace - formally typeset
Search or ask a question
Topic

Bellman equation

About: Bellman equation is a research topic. Over the lifetime, 5884 publications have been published within this topic receiving 135589 citations.


Papers
More filters
Journal ArticleDOI
TL;DR: In this paper, the authors studied the differentiability of the policy function and the value function in onedimensional dynamic programming problems, where the objective function is a concave function.
Abstract: THE GOAL OF THIS PAPER is to study the differentiability of the policy function or, in other words, the twice differentiability of the value function, in onedimensional dynamic programming problems Here we treat the case commonly used in Economics where the objective function is a concave function The need for the twice differentiability of the value function has been noticed a long time ago Pontryagin et al (1962, p 73, last paragraph) shows it to be necessary to derive the maximum principle from the Bellman equation, and gives a counterexample for nonconcave problems The twice differentiability of the value function is also important if one wants to do comparative statics or if one wants to have smooth coefficients in Bellman's partial differential equation Another important application has been given recently by Kehoe, Levine, and Romer (1989) to obtain the finiteness of the number of equilibria in an economy with an infinite number of goods and a finite number of consumers For other motivations as well as background for this paper, see Stokey and Lucas (1989) More formally, in the second section of this paper we introduce the basic facts and notations to be used latter The third section is devoted to showing that the policy function is C1 and therefore that the value function is C2, if the policy function is increasing For a multidimensional generalization the reader is referred to Santos (1991) In Section 4 we show, by means of a counterexample, that the policy function might not be a twice differentiable function and hence that the value function might not be three times differentiable, even if the objective is three times continuously differentiable and strongly concave and the

55 citations

Book ChapterDOI
01 Aug 2011
TL;DR: Optimizing a sequence of actions to attain some future goal is the general topic of control theory Stengel (1993); Fleming and Soner (1992).
Abstract: Optimizing a sequence of actions to attain some future goal is the general topic of control theory Stengel (1993); Fleming and Soner (1992). It views an agent as an automaton that seeks to maximize expected reward (or minimize cost) over some future time period. Two typical examples that illustrate this are motor control and foraging for food. As an example of a motor control task, consider a human throwing a spear to kill an animal. Throwing a spear requires the execution of a motor program that is such that at the moment that the spear releases the hand, it has the correct speed and direction such that it will hit the desired target. A motor program is a sequence of actions, and this sequence can be assigned a cost that consists generally of two terms: a path cost, that specifies the energy consumption to contract the muscles in order to execute the motor program; and an end cost, that specifies whether the spear will kill the animal, just hurt it, or misses it altogether. The optimal control solution is a sequence of motor commands that results in killing the animal by throwing the spear with minimal physical effort. If x denotes the state space (the positions and velocities of the muscles), the optimal control solution is a function u(x, t) that depends both on the actual state of the system at each time and also depends explicitly on time. When an animal forages for food, it explores the environment with the objective to find as much food as possible in a short time window. At each time t, the animal considers the food it expects to encounter in the period [t, t+ T ]. Unlike the motor control example, the time horizon recedes into the future with the current time and the cost consists now only of a path contribution and no end-cost. Therefore, at each time the animal faces the same task, but possibly from a different location of the animal in the environment. The optimal control solution u(x) is now timeindependent and specifies for each location in the environment x the direction u in which the animal should move. The general stochastic control problem is intractable to solve and requires an exponential amount of memory and computation time. The reason is that the state space needs to be discretized and thus becomes exponentially large in the number of dimensions. Computing the expectation values means that all states need to be visited and requires the summation of exponentially large sums. The same intractabilities are encountered in reinforcement learning.

55 citations

Journal ArticleDOI
TL;DR: A new equivalent formulation of Clarke's multiplier rule for nonsmooth optimization problems is given, which shows that the set of all multipliers satisfying necessary optimality conditions is the union of a finite number of closed convex cones.
Abstract: For several types of finite or infinite dimensional optimization problems the marginal function or optimal value function is characterized by different local approximations such as generalized gradients, generalized directional derivatives, directional Hadamard or Dini derivatives. We give estimates for these terms which are determined by multipliers satisfying necessary optimality conditions. When the functions which define the optimization problem are more than once continuously differentiable, then higher order necessary conditions are employed to obtain refined estimates for the marginal function. As a by-product we give a new equivalent formulation of Clarke's multiplier rule for nonsmooth optimization problems. This shows that the set of all multipliers satisfying these necessary conditions is the union of a finite number of closed convex cones.

55 citations

Proceedings ArticleDOI
16 Jul 2011
TL;DR: It is shown that the optimal policies in CPOMDPs can be randomized, and exact and approximate dynamic programming methods for computing randomized optimal policies are presented.
Abstract: Constrained partially observable Markov decision processes (CPOMDPs) extend the standard POMDPs by allowing the specification of constraints on some aspects of the policy in addition to the optimality objective for the value function. CPOMDPs have many practical advantages over standard POMDPs since they naturally model problems involving limited resource or multiple objectives. In this paper, we show that the optimal policies in CPOMDPs can be randomized, and present exact and approximate dynamic programming methods for computing randomized optimal policies. While the exact method requires solving a minimax quadratically constrained program (QCP) in each dynamic programming update, the approximate method utilizes the point-based value update with a linear program (LP). We show that the randomized policies are significantly better than the deterministic ones. We also demonstrate that the approximate point-based method is scalable to solve large problems.

55 citations

Proceedings Article
22 Jul 2012
TL;DR: This work introduces a more general and richer dual optimization criterion, which minimizes the average (undiscounted) cost of only paths leading to the goal among all policies that maximize the probability to reach the goal.
Abstract: Optimal solutions to Stochastic Shortest Path Problems (SSPs) usually require that there exists at least one policy that reaches the goal with probability 1 from the initial state. This condition is very strong and prevents from solving many interesting problems, for instance where all possible policies reach some dead-end states with a positive probability. We introduce a more general and richer dual optimization criterion, which minimizes the average (undiscounted) cost of only paths leading to the goal among all policies that maximize the probability to reach the goal. We present policy update equations in the form of dynamic programming for this new dual criterion, which are different from the standard Bellman equations. We demonstrate that our equations converge in infinite horizon without any condition on the structure of the problem or on its policies, which actually extends the class of SSPs that can be solved. We experimentally show that our dual criterion provides wellfounded solutions to SSPs that can not be solved by the standard criterion, and that using a discount factor with the latter certainly provides solution policies but which are not optimal considering our well-founded criterion.

54 citations


Network Information
Related Topics (5)
Optimal control
68K papers, 1.2M citations
87% related
Bounded function
77.2K papers, 1.3M citations
85% related
Markov chain
51.9K papers, 1.3M citations
85% related
Linear system
59.5K papers, 1.4M citations
84% related
Optimization problem
96.4K papers, 2.1M citations
83% related
Performance
Metrics
No. of papers in the topic in previous years
YearPapers
2023261
2022537
2021369
2020411
2019348
2018353