scispace - formally typeset
Search or ask a question
Topic

Bellman equation

About: Bellman equation is a research topic. Over the lifetime, 5884 publications have been published within this topic receiving 135589 citations.


Papers
More filters
ReportDOI
TL;DR: In this paper, the authors propose a behavioral Bellman equation to model boundedly rational dynamic programming, where the agent uses an endogenously simplified or "sparse" model of the world and the consequences of his actions.
Abstract: This paper proposes a tractable way to model boundedly rational dynamic programming. The agent uses an endogenously simplified or "sparse" model of the world and the consequences of his actions, and act according to a behavioral Bellman equation. The framework is applied to some of the canonical models in macroeconomics and finance.In the consumption-savings model, the consumer decides to pay little or no attention to the interest rate and more attention to his income. Ricardian equivalence and the Lucas critique partially fail, because the consumer is only partially attentive to taxes and policy changes.The model also yields a behavioral version of the New Keynesian model. It helps solve the "forward guidance puzzle", the fact that in that model, shocks to very distant rates have a very powerful impact on today's consumption and inflation: because the agent is de facto myopic, this effect is muted.The paper gives a behavioral version of the canonical neoclassical growth model: fluctuations are larger and more persistent as agents do not react optimally to macroeconomic variables. Finally, in a Merton-style dynamic portfolio choice problem, the agent endogenously pays limited or no attention to the varying equity premium and hedging demand terms.

65 citations

Journal ArticleDOI
TL;DR: It is shown that a unified framework consisting of a sequential diagram, an influence diagram, and a common formulation table for the problem's data, suffices for compact and consistent representation, economical formulation, and efficient solution of (asymmetric) decision problems.
Abstract: In this paper we introduce a new graph, the sequential decision diagram, to aid in modeling formulation, and solution of sequential decision problems under uncertainty. While as compact as an influence diagram, the sequential diagram captures the asymmetric and sequential aspects of decision problems as effectively as decision trees. We show that a unified framework consisting of a sequential diagram, an influence diagram, and a common formulation table for the problem’s data, suffices for compact and consistent representation, economical formulation, and efficient solution of (asymmetric) decision problems. In addition to asymmetry, the framework exploits other sources of computational efficiency, such as conditional independence and value function decomposition, making it also useful in evaluating dynamic-programming problems. The formulation table and recursive algorithm can be readily implemented in computers for solving large-scale problems. Examples are provided to illustrate the methodology in both...

65 citations

Proceedings Article
13 Jul 2008
TL;DR: An exact dynamic programming update for constrained partially observable Markov decision processes (CPOMDPs) relies on implicit enumeration of the vectors in the piecewise linear value function, and pruning operations to obtain a minimal representation of the updated value function.
Abstract: We describe an exact dynamic programming update for constrained partially observable Markov decision processes (CPOMDPs). State-of-the-art exact solution of unconstrained POMDPs relies on implicit enumeration of the vectors in the piecewise linear value function, and pruning operations to obtain a minimal representation of the updated value function. In dynamic programming for CPOMDPs, each vector takes two valuations, one with respect to the objective function and another with respect to the constraint function. The dynamic programming update consists of finding, for each belief state, the vector that has the best objective function valuation while still satisfying the constraint function. Whereas the pruning operation in an unconstrained POMDP requires solution of a linear program, the pruning operation for CPOMDPs requires solution of a mixed integer linear program.

65 citations

Journal ArticleDOI
TL;DR: This work presents a new, kernel-based approach to reinforcement learning which overcomes this difficulty and provably converges to a unique solution and can be shown to be consistent in the sense that its costs converge to the optimal costs asymptotically.
Abstract: Reinforcement learning (RL) is concerned with the identification of optimal controls in Markov decision processes (MDPs) where no explicit model of the transition probabilities is available. We propose a class of RL algorithms which always produces stable estimates of the value function. In detail, we use "local averaging" methods to construct an approximate dynamic programming (ADP) algorithm. Nearest-neighbor regression, grid-based approximations, and trees can all be used as the basis of this approximation. We provide a thorough theoretical analysis of this approach and we demonstrate that ADP converges to a unique approximation in continuous-state average-cost MDPs. In addition, we prove that our method is consistent in the sense that an optimal approximate strategy is identified asymptotically. With regard to a practical implementation, we suggest a reduction of ADP to standard dynamic programming in an artificial finite-state MDP.

65 citations

Journal ArticleDOI
TL;DR: The simulation-based approximate dynamic programming (ADP) method is extended to optimal feedback control of fed-batch reactors to consider a free-end problem, wherein the batch time is considered in finding the optimal feeding strategy in addition to the final time productivity.
Abstract: In this brief, we extend the simulation-based approximate dynamic programming (ADP) method to optimal feedback control of fed-batch reactors. We consider a free-end problem, wherein the batch time is considered in finding the optimal feeding strategy in addition to the final time productivity. In ADP, the optimal solution is parameterized in the form of profit-to-go function. The original definition of profit-to-go is modified to include the decision of batch termination. Simulations from heuristic feeding policies generate the initial profit-to-go versus state data. An artificial neural network then approximates profit-to-go as a function of process state. Iterations of the Bellman equation are used to improve the profit-to-go function approximator. The profit-to-go function approximator thus obtained, is then implemented in an online controller. This method is applied to cloned invertase expression in Saccharomyces cerevisiae in a fed-batch bioreactor.

64 citations


Network Information
Related Topics (5)
Optimal control
68K papers, 1.2M citations
87% related
Bounded function
77.2K papers, 1.3M citations
85% related
Markov chain
51.9K papers, 1.3M citations
85% related
Linear system
59.5K papers, 1.4M citations
84% related
Optimization problem
96.4K papers, 2.1M citations
83% related
Performance
Metrics
No. of papers in the topic in previous years
YearPapers
2023261
2022537
2021369
2020411
2019348
2018353