Topic
Bellman equation
About: Bellman equation is a research topic. Over the lifetime, 5884 publications have been published within this topic receiving 135589 citations.
Papers published on a yearly basis
Papers
More filters
••
TL;DR: In this paper, the authors propose a behavioral Bellman equation to model boundedly rational dynamic programming, where the agent uses an endogenously simplified or "sparse" model of the world and the consequences of his actions.
Abstract: This paper proposes a tractable way to model boundedly rational dynamic programming. The agent uses an endogenously simplified or "sparse" model of the world and the consequences of his actions, and act according to a behavioral Bellman equation. The framework is applied to some of the canonical models in macroeconomics and finance.In the consumption-savings model, the consumer decides to pay little or no attention to the interest rate and more attention to his income. Ricardian equivalence and the Lucas critique partially fail, because the consumer is only partially attentive to taxes and policy changes.The model also yields a behavioral version of the New Keynesian model. It helps solve the "forward guidance puzzle", the fact that in that model, shocks to very distant rates have a very powerful impact on today's consumption and inflation: because the agent is de facto myopic, this effect is muted.The paper gives a behavioral version of the canonical neoclassical growth model: fluctuations are larger and more persistent as agents do not react optimally to macroeconomic variables. Finally, in a Merton-style dynamic portfolio choice problem, the agent endogenously pays limited or no attention to the varying equity premium and hedging demand terms.
65 citations
••
TL;DR: It is shown that a unified framework consisting of a sequential diagram, an influence diagram, and a common formulation table for the problem's data, suffices for compact and consistent representation, economical formulation, and efficient solution of (asymmetric) decision problems.
Abstract: In this paper we introduce a new graph, the sequential decision diagram, to aid in modeling formulation, and solution of sequential decision problems under uncertainty. While as compact as an influence diagram, the sequential diagram captures the asymmetric and sequential aspects of decision problems as effectively as decision trees. We show that a unified framework consisting of a sequential diagram, an influence diagram, and a common formulation table for the problem’s data, suffices for compact and consistent representation, economical formulation, and efficient solution of (asymmetric) decision problems. In addition to asymmetry, the framework exploits other sources of computational efficiency, such as conditional independence and value function decomposition, making it also useful in evaluating dynamic-programming problems. The formulation table and recursive algorithm can be readily implemented in computers for solving large-scale problems. Examples are provided to illustrate the methodology in both...
65 citations
•
13 Jul 2008TL;DR: An exact dynamic programming update for constrained partially observable Markov decision processes (CPOMDPs) relies on implicit enumeration of the vectors in the piecewise linear value function, and pruning operations to obtain a minimal representation of the updated value function.
Abstract: We describe an exact dynamic programming update for constrained partially observable Markov decision processes (CPOMDPs). State-of-the-art exact solution of unconstrained POMDPs relies on implicit enumeration of the vectors in the piecewise linear value function, and pruning operations to obtain a minimal representation of the updated value function. In dynamic programming for CPOMDPs, each vector takes two valuations, one with respect to the objective function and another with respect to the constraint function. The dynamic programming update consists of finding, for each belief state, the vector that has the best objective function valuation while still satisfying the constraint function. Whereas the pruning operation in an unconstrained POMDP requires solution of a linear program, the pruning operation for CPOMDPs requires solution of a mixed integer linear program.
65 citations
••
TL;DR: This work presents a new, kernel-based approach to reinforcement learning which overcomes this difficulty and provably converges to a unique solution and can be shown to be consistent in the sense that its costs converge to the optimal costs asymptotically.
Abstract: Reinforcement learning (RL) is concerned with the identification of optimal controls in Markov decision processes (MDPs) where no explicit model of the transition probabilities is available. We propose a class of RL algorithms which always produces stable estimates of the value function. In detail, we use "local averaging" methods to construct an approximate dynamic programming (ADP) algorithm. Nearest-neighbor regression, grid-based approximations, and trees can all be used as the basis of this approximation. We provide a thorough theoretical analysis of this approach and we demonstrate that ADP converges to a unique approximation in continuous-state average-cost MDPs. In addition, we prove that our method is consistent in the sense that an optimal approximate strategy is identified asymptotically. With regard to a practical implementation, we suggest a reduction of ADP to standard dynamic programming in an artificial finite-state MDP.
65 citations
••
TL;DR: The simulation-based approximate dynamic programming (ADP) method is extended to optimal feedback control of fed-batch reactors to consider a free-end problem, wherein the batch time is considered in finding the optimal feeding strategy in addition to the final time productivity.
Abstract: In this brief, we extend the simulation-based approximate dynamic programming (ADP) method to optimal feedback control of fed-batch reactors. We consider a free-end problem, wherein the batch time is considered in finding the optimal feeding strategy in addition to the final time productivity. In ADP, the optimal solution is parameterized in the form of profit-to-go function. The original definition of profit-to-go is modified to include the decision of batch termination. Simulations from heuristic feeding policies generate the initial profit-to-go versus state data. An artificial neural network then approximates profit-to-go as a function of process state. Iterations of the Bellman equation are used to improve the profit-to-go function approximator. The profit-to-go function approximator thus obtained, is then implemented in an online controller. This method is applied to cloned invertase expression in Saccharomyces cerevisiae in a fed-batch bioreactor.
64 citations