scispace - formally typeset
Search or ask a question
Topic

Bellman equation

About: Bellman equation is a research topic. Over the lifetime, 5884 publications have been published within this topic receiving 135589 citations.


Papers
More filters
Journal ArticleDOI
TL;DR: In this article, a novel adaptive dynamic programming scheme based on general value iteration (VI) was proposed to obtain near optimal control for discrete-time affine non-linear systems with continuous state and control spaces.
Abstract: In this study, the authors propose a novel adaptive dynamic programming scheme based on general value iteration (VI) to obtain near optimal control for discrete-time affine non-linear systems with continuous state and control spaces. First, the selection of initial value function is different from the traditional VI, and a new method is introduced to demonstrate the convergence property and convergence speed of value function. Then, the control law obtained at each iteration can stabilise the system under some conditions. At last, an error-bound-based condition is derived considering the approximation errors of neural networks, and then the error between the optimal and approximated value functions can also be estimated. To facilitate the implementation of the iterative scheme, three neural networks with Levenberg-Marquardt training algorithm are used to approximate the unknown system, the value function and the control law. Two simulation examples are presented to demonstrate the effectiveness of the proposed scheme.

109 citations

Journal ArticleDOI
TL;DR: It is proved that an $(s, S)$ policy is optimal in a continuous-review stochastic inventory model with a fixed ordering cost when the demand is a mixture of a diffusion process and a compound Poisson process with exponentially distributed jump sizes.
Abstract: We prove that an $(s, S)$ policy is optimal in a continuous-review stochastic inventory model with a fixed ordering cost when the demand is a mixture of (i) a diffusion process and a compound Poisson process with exponentially distributed jump sizes, and (ii) a constant demand and a compound Poisson process. The proof uses the theory of impulse control. The Bellman equation of dynamic programming for such a problem reduces to a set of quasi-variational inequalities (QVI). An analytical study of the QVI leads to showing the existence of an optimal policy as well as the optimality of an $(s, S)$ policy. Finally, the combination of a diffusion and a general compound Poisson demand is not completely solved. We explain the difficulties and what remains open. We also provide a numerical example for the general case.

109 citations

Journal ArticleDOI
TL;DR: In this paper, the authors apply the compactification method to study the control problem where the state is governed by an Ito stochastic differential equation allowing both classical and singular control.
Abstract: We apply the compactification method to study the control problem where the state is governed by an Ito stochastic differential equation allowing both classical and singular control. The problem is reformulated as a martingale problem on an appropriate canonical space after the relaxed form of the classical control is introduced. Under some mild continuity hypotheses on the data, it is shown by purely probabilistic arguments that an optimal control for the problem exists. The value function is shown to be Borel measurable.

108 citations

Journal ArticleDOI
TL;DR: In this paper, the authors considered the continuous time Bertrand and Cournot competitions with uncertain market demand and under the constraint of finite supplies (or exhaustible resources) and showed that a large degree of competitive interaction causes firms to slow down production.
Abstract: We study how continuous time Bertrand and Cournot competitions, in which firms producing similar goods compete with one another by setting prices or quantities respectively, can be analyzed as continuum dynamic mean field games. Interactions are of mean field type in the sense that the demand faced by a producer is affected by the others through their average price or quantity. Motivated by energy or consumer goods markets, we consider the setting of a dynamic game with uncertain market demand, and under the constraint of finite supplies (or exhaustible resources). The continuum game is characterized by a coupled system of partial differential equations: a backward Hamilton---Jacobi---Bellman partial differential equation (PDE) for the value function, and a forward Kolmogorov PDE for the density of players. Asymptotic approximation enables us to deduce certain qualitative features of the game in the limit of small competition. The equilibrium of the game is further studied using numerical solutions, which become very tractable by considering the tail distribution function instead of the density itself. This also allows us to consider Dirac delta distributions to use the continuum game to mimic finite $$N$$N-player nonzero-sum differential games, the advantage being having to deal with two coupled PDEs instead of $$N$$N. We find that, in accordance with the two-player game, a large degree of competitive interaction causes firms to slow down production. The continuum system can therefore be used qualitative as an approximation to even small player dynamic games.

108 citations

Journal ArticleDOI
TL;DR: In this paper, the principal seeks an optimal payment scheme, striving to induce the actions that will maximize her expected discounted profits over a finite planning horizon, and a set of assumptions are introduced that enable a systematic analysis.
Abstract: The principal-agent paradigm, in which a principal has a primary stake in the performance of some system but delegates operational control of that system to an agent, has many natural applications in operations management (OM). However, existing principal-agent models are of limited use to OM researchers because they cannot represent the rich dynamic structure required of OM models. This paper formulates a novel dynamic model that overcomes these limitations by combining the principal-agent framework with the physical structure of a Markov decision process. In this model one has a system moving from state to state as time passes, with transition probabilities depending on actions chosen by an agent, and a principal who pays the agent based on state transitions observed. The principal seeks an optimal payment scheme, striving to induce the actions that will maximize her expected discounted profits over a finite planning horizon. Although dynamic principal-agent models similar to the one proposed here are considered intractable, a set of assumptions are introduced that enable a systematic analysis. These assumptions involve the "economic structure" of the model but not its "physical structure." Under these assumptions, the paper establishes that one can use a dynamic-programming recursion to derive an optimal payment scheme. This scheme is memoryless and satisfies a generalization of Bellman's principle of optimality. Important managerial insights are highlighted in the context of a two-state example called "the maintenance problem".

108 citations


Network Information
Related Topics (5)
Optimal control
68K papers, 1.2M citations
87% related
Bounded function
77.2K papers, 1.3M citations
85% related
Markov chain
51.9K papers, 1.3M citations
85% related
Linear system
59.5K papers, 1.4M citations
84% related
Optimization problem
96.4K papers, 2.1M citations
83% related
Performance
Metrics
No. of papers in the topic in previous years
YearPapers
2023261
2022537
2021369
2020411
2019348
2018353