scispace - formally typeset
Search or ask a question

Showing papers on "Bellman equation published in 1986"


Journal ArticleDOI
TL;DR: A manufacturing system can be in one of two states: functional and failed, and it moves back and forth between these two states as a continuous time Markov chain, with mean time between failures = 1/ q1, and mean time to repair 1/q2.
Abstract: We address the problem of controlling the production rate of a failure prone manufacturing system so as to minimize the discounted inventory, cost, where certain cost rates are specified for both positive and negative inventories, and there is a constant demand rate for the commodity produced. The underlying theoretical problem is the optimal control of a continuous-time system with jump Markov disturbances, with an infinite horizon discounted cost criterion. We use two complementary approaches. First, proceeding informally, and using a combination of stochastic coupling, linear system arguments, stable and unstable eigenspaces, renewal theory, parametric optimization, etc., we arrive at a conjecture for the optimal policy. Then we address the previously ignored mathematical difficulties associated with differential equations with discontinuous right-hand sides, singularity of the optimal control problem, smoothness, and validity of the dynamic programming equation, etc., to give a rigorous proof of optimality of the conjectured policy. It is hoped that both approaches will find uses in other such problems also. We obtain the complete solution and show that the optimal solution is simply characterized by a certain critical number, which we call the optimal inventory level. If the current inventory level exceeds the optimal, one should not produce at all; if less, one should produce at the maximum rate; while if exactly equal, one should produce exactly enough to meet demand. We also give a simple explicit formula for the optimal inventory level.

643 citations


Journal ArticleDOI
TL;DR: In this paper, a Lagrange multiplier formulation involving a dynamic programming equation is utilized to relate the constrained optimization to an unconstrained optimization parametrized by the multiplier, leading to a proof for the existence of a semi-simple optimal constrained policy.
Abstract: Optimal causal policies maximizing the time-average reward over a semiMarkov decision process (SMDP), subject to a hard constraint on a timeaverage cost, are considered. Rewards and costs depend on the state and action, and contain running as well as switching components. It is supposed that the state space of the SMDP is finite, and the action space compact metric. The policy determines an action at each transition point of the SMDP. Under an accessibility hypothesis, several notions of time average are equivalent. A Lagrange multiplier formulation involving a dynamic programming equation is utilized to relate the constrained optimization to an unconstrained optimization parametrized by the multiplier. This approach leads to a proof for the existence of a semi-simple optimal constrained policy. That is, there is at most one state for which the action is randomized between two possibilities; at all other states, an action is uniquely chosen for each state. Affine forms for the rewards, costs and transition probabilities further reduce the optimal constrained policy to 'almost bang-bang' form, in which the optimal policy is not randomized, and is bang-bang except perhaps at one state. Under the same assumptions, one can alternatively find an optimal constrained policy that is strictly bang-bang, but may be randomized at one state. Application is made to flow control of a birth-and-death process (e.g., an MIMIs queue); under certain monotonicity restrictions on the reward and cost structure the preceding results apply, and in addition there is a simple acceptance region.

84 citations


Journal ArticleDOI
TL;DR: In this article, a general dynamic programming algorithm for the solution of optimal stochastic control problems concerning a class of discrete event systems is presented, where the emphasis is put on the numerical technique used for the approximation of the dynamic programming equation.
Abstract: This paper presents a general dynamic programming algorithm for the solution of optimal stochastic control problems concerning a class of discrete event systems. The emphasis is put on the numerical technique used for the approximation of the solution of the dynamic programming equation. This approach can be efficiently used for the solution of optimal control problems concerning Markov renewal processes. This is illustrated on a group preventive replacement model generalizing an earlier work of the authors.

64 citations


Journal ArticleDOI
TL;DR: It is argued that a failure to recognize the special features of the model in the context of which the principle was stated has resulted in the latter being misconstrued in the dynamic programming literature.
Abstract: New light is shed on Bellman's principle of optimality and the role it plays in Bellman's conception of dynamic programming. It is argued that a failure to recognize the special features of the model in the context of which the principle was stated has resulted in the latter being misconstrued in the dynamic programming literature.

56 citations


Journal ArticleDOI
TL;DR: In this article, multi-grid algorithms for the numerical solution of Hamilton-Jacobi-Bellman equations were developed for numerical solutions of Hamilton and Jacobi Bellman equations using a combination of standard multigrid techniques and the iterative methods used by Lions and mercier in [11].
Abstract: In this paper we develop multi-grid algorithms for the numerical solution of Hamilton-Jacobi-Bellman equations The proposed schemes result from a combination of standard multi-grid techniques and the iterative methods used by Lions and mercier in [11] A convergence result is given and the efficiency of the algorithms is illustrated by some numerical examples

51 citations


Journal ArticleDOI
TL;DR: In this article, the authors considered a general optimal control problem in which the constraints depend on a parameter and the resulting value function, and a formula for the generalized gradient of V was proven and then used to obtain results on stability and controllability of the problem.
Abstract: We consider a general optimal control problem in which the constraints depend on a parameter $\alpha $, and the resulting value function $V(\alpha )$. A formula for the generalized gradient of V is proven and then used to obtain results on stability and controllability of the problem. A special study is made of the time-optimal control problem, one consequence of which is a new criterion assuring local null-controllability of the system and continuity of the minimal time function at the origin.

41 citations


Journal ArticleDOI
TL;DR: The dual control law for an integrator with constant but unknown gain is computed in this paper, and a representation which makes it easy to compare dual control with certainty equivalence and cautious control is also introduced.
Abstract: The dual control law for an integrator with constant but unknown gain is computed Numerical problems associated with the solution of the Bellman equation are reviewed Properties of the dual control law are discussed A representation which makes it easy to compare dual control with certainty equivalence and cautious control is also introduced

40 citations


Journal ArticleDOI
TL;DR: Convergence theorems that, when applied to the case of bounded rewards, give stronger results than those in [9] are proved and bounds on the rates of convergence under several assumptions are given.
Abstract: A finite-state iterative scheme introduced by White [9] to approximate the optimal value function of denumerable-state Markov decision processes with bounded rewards, is extended to the case of unbounded rewards. Convergence theorems that, when applied to the case of bounded rewards, give stronger results than those in [9] are proved. Moreover, bounds on the rates of convergence under several assumptions are given and the extended scheme is used to obtain policies with asymptotic optimality properties.

37 citations


Journal ArticleDOI
TL;DR: The optimal results are most strongly sensitive to the rate of stochastic jumps and to the quadratic cost factor to a lesser extent when the deterministic bioeconomic parameters are taken from aggregate antarctic pelagic whaling data.
Abstract: Dynamic programming is employed to examine the effects of large, sudden changes in population size on the optimal harvest strategy of an exploited resource population. These changes are either adverse or favorable and are assumed to occur at times of events of a Poisson process. The amplitude of these jumps is assumed to be density independent. In between the jumps the population is assumed to grow logistically. The Bellman equation for the optimal discounted present value is solved numerically and the optimal feedback control computed for the random jump model. The results are compared to the corresponding results for the quasi-deterministic approximation. In addition, the sensitivity of the results to the discount rate, the total jump rate and the quadratic cost factor is investigated. The optimal results are most strongly sensitive to the rate of stochastic jumps and to the quadratic cost factor to a lesser extent when the deterministic bioeconomic parameters are taken from aggregate antarctic pelagic whaling data.

32 citations


Journal ArticleDOI
TL;DR: In this article, a stochastic control problem similar to the one dimensional linear-quadratic-Gaussian problem but with an asymptotically linear cost for control is studied.
Abstract: A stochastic control problem similar to the one dimensional linear-quadratic-Gaussian problem but with an asymptotically linear cost for control is studied. The value function is characterized, and it is shown that the optimal control process has both absolutely continuous and singular components. A discussion of the fact that the value function is C 2 is given, and an example of a singular control problem in which the value function is not C 2 is presented.

31 citations


Journal ArticleDOI
TL;DR: In this article, a family of problems obtained by perturbing infinite-dimensionally the dynamics of an optimal control problem is examined, and a formula is derived for the generalized gradient of the associated value function, one which specializes to yield, for instance, information about ordinary directional derivatives.
Abstract: We examine a family of problems obtained by perturbing infinite-dimensionally the dynamics of an optimal control problem. A formula is derived for the generalized gradient of the associated value function, one which specializes to yield, for instance, information about ordinary directional derivatives. Several examples are discussed.

Journal ArticleDOI
TL;DR: It turns out that strong stability in the sense of Kojima in the first phase is a natural assumption for the iterated local minima of the parametric problem and a generalized version of a positive definiteness criterion of Fujiwara-Han-Mangasarian is used.
Abstract: In dynamic programming and decomposition methods one often applies an iterated minimization procedure. The problem variables are partitioned into several blocks, say x and y. Treating y as a parameter, the first phase consists of minimization with respect to the variable x. In a second phase the minimization of the resulting optimal value function depending on y is considered. In this paper we treat this basic idea on a local level. It turns out that strong stability in the sense of Kojima in the first phase is a natural assumption. In order to show that the iterated local minima of the parametric problem lead to a local minimum for the whole problem, we use a generalized version of a positive definiteness criterion of Fujiwara-Han-Mangasarian.

Journal ArticleDOI
TL;DR: In this paper, a test criterion for bankruptcy is developed, and a portfolio optimization problem is investigated and solved using Doleans-Dade's exponential formula, and the optimality criterion used is to maximize the expected rate of growth.

Book ChapterDOI
TL;DR: In this article, the authors synthesize some results about well-posedness and stability analysis in abstract minimum problems and optimal control of ordinary differential inclusions, based on the variational convergence.
Abstract: Publisher Summary This chapter synthesizes some results about well-posedness and stability analysis in abstract minimum problems and optimal control of ordinary differential inclusions. The chapter describes well-posedness of convex minimum problems in Banach spaces by using their optimal value functions. The chapter also deals with optimal control problems for differential inclusions. Neither convexity nor existence of optimal solutions is assumed. Further relates the continuous behavior of the optimal value function with respect to perturbations acting on the data, with its stable behavior when passing to the (unperturbated) relaxed problem. The approach behind these theorems is based on the variational convergence. The relations between variational convergence (including epi-convergence) and some results reported in the chapter are considered in with an eye on mathematical programming problems.

Journal ArticleDOI
TL;DR: This paper demonstrates how a Markov decision process MDP can be approximated to generate a policy bound, i.e., a function that bounds the optimal policy from below or from above for all states.
Abstract: This paper demonstrates how a Markov decision process MDP can be approximated to generate a policy bound, i.e., a function that bounds the optimal policy from below or from above for all states. We present sufficient conditions for several computationally attractive approximations to generate rigorous policy bounds. These approximations include approximating the optimal value function, replacing the original MDP with a separable approximate MDP, and approximating a stochastic MDP with its deterministic counterpart. An example from the field of fisheries management demonstrates the practical applicability of the results.

Journal ArticleDOI
TL;DR: The influence of Richard Bellman is seen in algorithms throughout the computer science literature and in particular on his work on the area of computer science known as algorithm design and analysis is focused on.


Proceedings ArticleDOI
01 Dec 1986
TL;DR: In this paper, an optimal control problem on a given interval [0, T] whose trajectories must satisfy the state constraint g (t, x(t))? 0 a.e.
Abstract: We consider an optimal control problem on a given interval [0, T] whose trajectories must satisfy the state constraint g (t, x(t)) ? 0 a.e. Infinite-dimensional perturbations of this constraint give rise to a value function V, whose epigraph is a closed set containing sensitivity information, controllability and penalization results, and even necessary conditions for optimality.

Journal ArticleDOI
TL;DR: This algorithm, together with the existing numerical methods for parabolic or elliptic PDEs, provides numerical schemes for the solution of Bellman equations.



Journal Article
TL;DR: Let us consider a system ¥ that is a diffusion one but depends on a control process U that has state space equal to the real line ft and states of the control process, control parameters, range over a set of parameters.
Abstract: Let us consider a system ¥. Its evolution is described by a stochastic process X = {X„ t ^ 0} with the state space equal to the real line ft. We assume that the process X is a diffusion one but depends on a control process U = {U„ t ^ 0}. The states of the control process, control parameters, range over a set % c ft\". To keep the presentation concise and simple we limit ourselves to the family of X given by the following stochastic differential equation

Journal ArticleDOI
TL;DR: In this article, the existence of a solution to the optimality equation for discounted finite Markov decision processes by means of Birkhoff's fixed point theorem was established, and the proof yields the well-known linear programming formulation for the optimal value function.

Journal ArticleDOI
TL;DR: In this paper, the existence of solutions generalisees for des equations differentielles non lineaires de type parabolique avec degenerescence is investigated, based on the result of Fleming (1964) and comme application.
Abstract: On etend des resultats de Fleming (1964) et comme application, on montre l'existence des solutions generalisees pour des equations differentielles non lineaires de type parabolique avec degenerescence

Journal ArticleDOI
TL;DR: In this article, the optimal strategies are solutions of certain partial initial value problems analogous to the Bellman equation in the theory of dynamic programming in non-zero-sum games with and without a control dependent noise.
Abstract: The paper deals with N -person nonzero-sum games in which the dynamics is described by Ito stochastic differential equations. Sufficient conditions are found guaranteeing the Nash-equilibrium for the strategies of the players. The optimal strategies are solutions of certain partial initial value problems analogous to the Bellman equation in the theory of dynamic programming. Linear-quadratic games with and without a control dependent noise are studied.

Journal ArticleDOI
TL;DR: Sufficient optimality conditions of dynamic programming type avoiding the axioms of Boltyan-skii's regular synthesis for control problems that have weakly stratified Hamiltonians are proved as discussed by the authors.
Abstract: Sufficient optimality conditions of dynamic programming type avoiding the axioms of Boltyan-skii’s “regular synthesis” for control problems that have weakly stratified Hamiltonians are proved An improved version of Boltyanskii’s “fundamental lemma” is applied to the “value function” defined as the minimum of the cost functional along the solutions of a Hamiltonian inclusion which plays the role of a “system of characteristics” for the Hamilton–Jacobi–Bellman equation of dynamic programming

Journal ArticleDOI
TL;DR: In this paper, sufficient conditions for optimal control in a linear-autonomous optimal-time problem with Lipschitz-continuous cost functional were studied, and the conditions involved a generalized Hamilton-Jacobi-Bellman equation.

Book ChapterDOI
01 Jan 1986
TL;DR: The modern mathematical economics literature is permeated with dynamics as discussed by the authors, and arguments based upon dynamics are advanced to justify various forms of equilibria; here we find issues such as the accessibility of pareto points or the comparison of different bargaining solution concepts.
Abstract: The modern mathematical economics literature is permeated with dynamics. This starts with a simple tatonnement story of how prices adjust according to supply and demand, and it continues with the more sophisticated price adjustment models which involve speculation, etc. Dynamics arise from the Euler, or the Bellman equations, to define the optimal paths in growth models, as well as in other optimization problems. Arguments based upon dynamics are advanced to justify various forms of equilibria; here we find issues such as the accessibility of pareto points or the comparison of different bargaining solution concepts. In recent years, as manifested by several of the papers presented at this conference, dynamics has been used to explain non-stationary behavior such as business cycles.

Proceedings ArticleDOI
01 Dec 1986
TL;DR: In this article, the robustness of nonlinear discrete-time systems is analyzed based on the existence of a stationary solution of the dynamic programming equation (DPE), which provides directly a Lyapunov function associated to the closed-loop system.
Abstract: In this paper the robustness of nonlinear discrete-time systems is analyzed. The nominal plant is supposed to be controlled by means of a feedback control law which is optimal with respect to some given criterion. The robustness of the closed-loop system is studied for two different classes of perturbations in the control law, which are called gain and additive nonlinear perturbations. The results are entirely based on the existence of a stationary solution of the dynamic programming equation (DPE), which provides directly a Lyapunov function associated to the closed-loop system. The convexity of that solution and the use of the Taylor formula appear to be the key to establish the robustness properties of the nominal plant. Two examples are solved in order to show an interesting fact: the existence of a compromise between the robustness of the system subjected to the two different classes of perturbations.