scispace - formally typeset
Search or ask a question

Showing papers on "Bellman equation published in 1994"


Book ChapterDOI
14 Dec 1994
TL;DR: In this article, a general framework for hybrid control problems is proposed, which encompasses several types of hybrid phenomena considered in the literature, and a specific control problem is studied in this framework, leading to an existence result for optimal controls.
Abstract: We propose a very general framework for hybrid control problems that encompasses several types of hybrid phenomena considered in the literature. A specific control problem is studied in this framework, leading to an existence result for optimal controls. The "value function" associated with this problem is expected to satisfy a set of "generalized quasi-variational inequalities". >

262 citations


Journal ArticleDOI
TL;DR: In this article, a general method for constructing high-order approximation schemes for Hamilton-Jacobi-Bellman equations is given, based on a discrete version of the Dynamic Programming Principle.
Abstract: A general method for constructing high-order approximation schemes for Hamilton-Jacobi-Bellman equations is given. The method is based on a discrete version of the Dynamic Programming Principle. We prove a general convergence result for this class of approximation schemes also obtaining, under more restrictive assumptions, an estimate in $L^\infty$ of the order of convergence and of the local truncation error. The schemes can be applied, in particular, to the stationary linear first order equation in ${\Bbb R}^n$ . We present several examples of schemes belonging to this class and with fast convergence to the solution.

166 citations


Book
31 Oct 1994
TL;DR: In this article, the Ekeland Variational Principle is applied to the problem of optimal control of nonlinear Parameter Distributed Systems (PDS) and the HINFINITY-Control Problem is formulated.
Abstract: Preface. Symbols and Notations. I: Generalized Gradients and Optimality. 1. Fundamentals of Convex Analysis. 2. Generalized Gradients. 3. The Ekeland Variational Principle. II: Optimal Control of Ordinary Differential Systems. 1. Formulation of the Problem and Existence. 2. The Maximum Principle. 3. Applications of the Maximum Principle. III: The Dynamic Programming Method. 1. The Dynamic Programming Equation. 2. Variational and Viscosity Solutions to the Equation of Dynamic Programming. 3. Constructive Approaches to Synthesis Problem IV: Optimal Control of Parameter Distributed Systems. 1. General Description of Parameter Distributed Systems. 2. Optimal Convex Control Problems. 3. The HINFINITY-Control Problem. 4. Optimal Control of Nonlinear Parameter Distributed Systems. Subject Index.

150 citations


Journal ArticleDOI
TL;DR: An upper bound on performance loss is derived that is slightly tighter than that in Bertsekas (1987), and the extension of the bound to Q-learning is shown to provide a partial theoretical rationale for the approximation of value functions.
Abstract: Many reinforcement learning approaches can be formulated using the theory of Markov decision processes and the associated method of dynamic programming (DP) The value of this theoretical understanding, however, is tempered by many practical concerns One important question is whether DP-based approaches that use function approximation rather than lookup tables can avoid catastrophic effects on performance This note presents a result of Bertsekas (1987) which guarantees that small errors in the approximation of a task's optimal value function cannot produce arbitrarily bad performance when actions are selected by a greedy policy We derive an upper bound on performance loss that is slightly tighter than that in Bertsekas (1987), and we show the extension of the bound to Q-learning (Watkins, 1989) These results provide a partial theoretical rationale for the approximation of value functions, an issue of great practical importance in reinforcement learning

147 citations


01 Jan 1994
TL;DR: In this article, a general framework for hybrid control problems is proposed, which encompasses several types of hybrid phenomena considered in the literature, leading to an existence result for optimal control, and the value function associated with this problem is expected to satisfy a set of generalized quasi-variational inequalities.
Abstract: We propose a very general framework for hybrid control problems which encompasses several types of hybrid phenomena considered in the literature. A specific control problem is studied in this framework, leading to an existence result for optimal controls. The "value function" associated with this problem is expected to satisfy a set of "generalized quasi-variational inequalities" which are formally derived.

69 citations


Journal ArticleDOI
TL;DR: A new dynamic programming method for the single item capacitated dynamic lot size model with non-negative demands and no backlogging is developed, which builds the Optimal value function in piecewise linear segments.
Abstract: We develop a new dynamic programming method for the single item capacitated dynamic lot size model with non-negative demands and no backlogging. This approach builds the Optimal value function in piecewise linear segments. It works very well on the test problems, requiring less than 0.3 seconds to solve problems with 48 periods on a VAX 8600. Problems with the time horizon up to 768 periods are solved. Empirically, the computing effort increases only at a quadratic rate relative to the number of periods in the time horizon.

69 citations


Journal ArticleDOI
TL;DR: In this article, a simple problem of combined singular stochastic control and optimal stopping is formulated and solved, and it is shown that the optimal strategies can take qualitatively different forms, depending on parameter values.
Abstract: In this paper a simple problem of combined singular stochastic control and optimal stopping is formulated and solved. We find that the optimal strategies can take qualitatively different forms, depending on parameter values. We also study a variant on the problem in which the value function is inherently nonconvex. The proofs employ the generalised Ito formula applicable for differences of convex functions.

64 citations


Journal ArticleDOI
TL;DR: In this article, the authors considered an infinite horizon investment-consumption model in which a single agent consumes and distributes his wealth between two assets, a bond and a stock, and the problem of maximization of the total utility from consumption was treated, when state and control (consumption, rates of trading) constraints are present.
Abstract: This paper considers an infinite horizon investment-consumption model in which a single agent consumes and distributes his wealth between two assets, a bond and a stock. The problem of maximization of the total utility from consumption is treated, when state (amount allocated in assets) and control (consumption, rates of trading) constraints are present. The value function is characterized as the unique viscosity solution of the Hamilton-Jacobi-Bellman equation which, actually, is a Variational Inequality with gradient constraints. Numerical schemes are then constructed in order to compute the value function and the location of the free boundaries of the so-called transaction regions. These schemes are a combination of implicit and explicit schemes; their convergence is obtained from the uniqueness of viscosity solutions to the HJB equation.

64 citations


ReportDOI
04 Apr 1994
TL;DR: In this article, the authors presented a number of basic results in the theory of viscosity solutions of fully nonlinear differential equations of first and second order in finite and infinite dimensions.
Abstract: : The eight publications produced by the project established a number of basic results in the theory of viscosity solutions of fully nonlinear differential equations of first and second order in finite and infinite dimensions. These equations arise in the dynamic programming theory of control and differential games (the finite dimensional theory for ode and the infinite dimensional theory for pde dynamics). Being fully nonlinear, the equations do not typically admit regular or classical solutions, and the appropriate notion is that of viscosity solutions. Two major advances in the first order infinite dimensional case consisted of determining the precise notion appropriate to a class of infinite dimensional problems with unbounded terms arising from the pde dynamics, and the examination of a limit case in which the value function is not a solution, but the maximal subsolution. Significant contributions to the second order theory include a new exposition of the finite dimensional theory based on results from previous funding, an infinite dimensional generalization of the foundational result used in this exposition, and the extension of the theory to second order equations in infinite dimensions with unbounded first order terms.

55 citations


Journal ArticleDOI
TL;DR: In this paper, the authors present two methods for approximating the optimal groundwater pumping policy for several interrelated aquifers in a stochastic setting that also involves conjunctive use of surface water.
Abstract: This paper presents two methods for approximating the optimal groundwater pumping policy for several interrelated aquifers in a stochastic setting that also involves conjunctive use of surface water. The first method employs a policy iteration dynamic programming (DP) algorithm where the value function is estimated by Monte Carlo simulation combined with curve-fitting techniques. The second method uses a Taylor series approximation to the functional equation of DP which reduces the problem, for a given observed state, to solving a system of equations equal in number to the aquifers. The methods are compared using a four-state variable, stochastic dynamic programming model of Madera County, California. The two methods yield nearly identical estimates of the optimal pumping policy, as well as the steady state pumping depth, suggesting that either method can be used in similar applications.

55 citations


BookDOI
01 Jan 1994
TL;DR: In this article, a theory of differential games and applications in worst-case controller design are presented. But the authors focus on zero-sum differential games: Pursuit-evasion games and numerical schemes.
Abstract: I. Zero-sum differential games: Theory and applications in worst-case controller design.- A Theory of Differential Games.- H?-Optimal Control of Singularly Perturbed Systems with Sampled-State Measurements.- New Results on Nonlinear H? Control Via Measurement Feedback.- Reentry Trajectory Optimization under Atmospheric Uncertainty as a Differential Game.- II. Zero-sum differential games: Pursuit-evasion games and numerical schemes.- Fully Discrete Schemes for the Value Function of Pursuit-Evasion Games.- Zero Sum Differential Games with Stopping Times: Some Results about its Numerical Resolution.- Singular Paths in Differential Games with Simple Motion.- The Circular Wall Pursuit.- III. Mathematical programming techniques.- Decomposition of Multi-Player Linear Programs.- Convergent Stepsizes for Constrained Min-Max Algorithms.- Algorithms for the Solution of a Large-Scale Single-Controller Stochastic Game.- IV. Stochastic games: Differential, sequential and Markov Games.- Stochastic Games with Average Cost Constraints.- Stationary Equilibria for Nonzero-Sum Average Payoff Ergodic Stochastic Games and General State Space.- Overtaking Equilibria for Switching Regulator and Tracking Games.- Monotonicity of Optimal Policies in a Zero Sum Game: A Flow Control Model.- V. Applications.- Capital Accumulation Subject to Pollution Control: A Differential Game with a Feedback Nash Equilibrium.- Coastal States and Distant Water Fleets Under Extended Jurisdiction: The Search for Optimal Incentive Schemes.- Stabilizing Management and Structural Development of Open-Access Fisheries.- The Non-Uniqueness of Markovian Strategy Equilibrium: The Case of Continuous Time Models for Non-Renewable Resources.- An Evolutionary Game Theory for Differential Equation Models with Reference to Ecosystem Management.- On Barter Contracts in Electricity Exchange.- Preventing Minority Disenfranchisement Through Dynamic Bayesian Reapportionment of Legislative Voting Power.- Learning by Doing and Technology Sharing in Asymmetric Duopolies.

Book ChapterDOI
01 Jan 1994
TL;DR: In this paper, the authors consider the classical pursuit evasion problem and an approximation scheme based on dynamic programming and prove the convergence of the scheme to the value function of the game by using some recent results and methods of the theory of viscosity solutions to the Isaacs equations.
Abstract: We consider the classical pursuit-evasion problem and an approximation scheme based on Dynamic Programming We prove the convergence of the scheme to the value function of the game by using some recent results and methods of the theory of viscosity solutions to the Isaacs equations The more restrictive assumption is the continuity of the value function, but we can eliminate it when dealing with control problems with a single player We test the algorithm on two simple examples with explicit solution

Journal ArticleDOI
TL;DR: In this paper, the authors trace Caratheodory's approach to the calculus of variations and show that famous results in optimal control theory, including the maximum principle and the Bellman equation, are consequences of these earlier results.
Abstract: One of the most important and deep results in optimal control theory is the maximum principle attributed to Hestenes (1950) and in particular to Boltyanskii, Gamkrelidze, and Pontryagin (1956). Another prominent result is known as the Bellman equation, which is associated with Isaacs' and Bellman's work (later than 1951). However, precursors of both the maximum principle and the Bellman equation can already be found in Caratheodory's book of 1935 (Ref. 1a), the first even in his earlier work of 1926 which is given in Ref. 2. This is not a widely acknowledged fact. The present tutorial paper traces Caratheodory's approach to the calculus of variations, once called the "royal road in the calculus of variations," and shows that famous results in optimal control theory, including the maximum principle and the Bellman equation, are consequences of Caratheodory's earlier results.

Journal ArticleDOI
TL;DR: In this article, the authors considered a two-machine flow shop subject to breakdown and repair of machines and subject to non-negativity constraints on work-in-process and showed that the value function of the problem is locally Lipschitz and is a viscosity solution to the dynamic programming equation together with certain boundary conditions.

Journal ArticleDOI
TL;DR: This article studies local sensitivity analysis of nonlinear programming problems in Banach spaces with arbitrary sets of solutions and no a priori regularity properties by developing a corresponding theory for unconstrained optimization and applying this theory to general nonlinear programs by way of certain reduction principles.
Abstract: This article studies local sensitivity analysis of nonlinear programming problems in Banach spaces with arbitrary sets of solutions and no a priori regularity properties. This is done by first developing a corresponding theory for unconstrained optimization involving simple composite functions and then applying this theory to general nonlinear programs by way of certain reduction principles.

Journal ArticleDOI
TL;DR: The concept of a dynamic job shop is introduced by interpreting the system as a directed graph, and the structure of the system dynamics is characterized for its use in the asymptotic analysis.
Abstract: This paper presents an asymptotic analysis of hierarchical production planning in a general manufacturing system consisting of a network of unreliable machines producing a variety of products. The concept of a dynamic job shop is introduced by interpreting the system as a directed graph, and the structure of the system dynamics is characterized for its use in the asymptotic analysis. The optimal control problem for the system is a state-constrained problem, since the number of parts in any buffer between any two machines must remain nonnegative. A limiting problem is introduced in which the stochastic machine capacities are replaced by corresponding equilibrium mean capacities, as the rate of change in machine states approaches infinity. The value function of the original problem is shown to converge to that of the limiting problem, and the convergence rate is obtained. Furthermore, near-optimal controls for the original problem are constructed from near-optimal controls of the limiting problem, and an error estimate is obtained on the near optimality of the constructed controls. >

Journal ArticleDOI
TL;DR: In this paper, an asymptotic analysis of hierarchical manufacturing systems with stochastic demand and machines subject to breakdown and repair as the rate of change in machine states approaches infinity is presented.

Journal ArticleDOI
TL;DR: In this paper, the authors proved that the value function, which must be defined on an augmented state space to take care of the non-Markovian feature of the running maximum, is the unique viscosity solution of the associated Bellman equation, which turns out to be, in the second case, a variational inequality with an oblique derivative boundary condition.
Abstract: Stochastic control problems are considered, where the cost to be minimized is either a running maximum of the state variable or more generally a running maximum of a function of the state variable and the control. In both cases it is proved that the value function, which must be defined on an augmented state space to take care of the non-Markovian feature of the running maximum, is the unique viscosity solution of the associated Bellman equation, which turns out to be, in the second case, a variational inequality with an oblique derivative boundary condition. Most of this work consists of proving the convergence of LP approximations and this is done by purely partial differential equation (PDE) methods.

Journal ArticleDOI
TL;DR: It is shown that if preferences are defined via a collection of attributes, then, under common conditions, the principle of optimality is valid if and only if the preferences can be represented by a linear function over the attributes.
Abstract: Given an acyclic network and a preference-order relation on paths, when and how can Bellman's principle of optimality be combined with interactive programming to efficiently locate an optimal path? We show that if preferences are defined via a collection of attributes, then, under common conditions, the principle of optimality is valid if and only if the preferences can be represented by a linear (value) function over the attributes. Consequently, an interactive programming method is suggested which assesses the value function while using the principle of optimality to efficiently search for an optimal path.

Journal ArticleDOI
F. R. Chang1
TL;DR: In this paper, the dynamics of the one-sector optimal growth model with recursive utility was analyzed through the use of a phase diagram, where the steady state uniquely exists and is a saddle point.
Abstract: The dynamics of the one-sector optimal growth model with recursive utility is analyzed through the use of a phase diagram. The steady state uniquely exists and is a saddle point. An increase in recursivity lowers both the steady-state capital and steady-state consumption. The model differs from the constant discount rate model in that a reduction in the population growth rate or a Hicks-neutral technical progress increases the steady-state consumption but not necessarily the steady-state capital.

Journal ArticleDOI
TL;DR: In this paper, a second-order generalized derivative based on Brownian motion is introduced, and an Ito-type formula is derived for functions $f(t,x)$ which are continuously differentiable in $x$ with Lipschitz derivative and are Lipschnitz continuous in $t$.
Abstract: A second-order generalized derivative based on Brownian motion is introduced. Using this derivative, an Ito-type formula is derived for functions $f(t,x)$, which are continuously differentiable in $x$ with Lipschitz derivative and are Lipschitz continuous in $t$. It is then shown that the value function of a stochastic control problem is a "generalized" solution of a second-order Hamilton-Jacobi equation. Such solutions are analogous to the Clarke generalized solutions of first-order Hamilton-Jacobi equations. Finally, it is shown that any "generalized" solution is a viscosity subsolution and a viscosity solution is a "generalized" solution.


Journal ArticleDOI
TL;DR: In this paper, the authors consider discrete-time optimal growth models in the reduced form and derive two new optimality conditions for models of this class, namely, the Holder continuity and Lipschitz continuity.

Journal ArticleDOI
Yaw Nyarko1
TL;DR: In this article, the authors studied the question on the convexity of the value function and Blackwell's Theorem and related this to the uniqueness of optimal policies and concluded that strict convexness and a strict inequality in Blackwell's theorem will hold if and only if from different priors different optimal actions may be chosen.
Abstract: I study the question on the convexity of the value function and Blackwell (1951)'s Theorem and relate this to the uniqueness of optimal policies. The main results will conclude that strict convexity and a strict inequality in Blackwell's Theorem will hold if and only if from different priors different optimal actions may be chosen.

Book ChapterDOI
01 Jan 1994
TL;DR: This work discretizes the associated Isaacs equation and obtains an approximate solution that converges to the value function of the game when the parameter of discretization tends to zero and presents an accelerated algorithm to find the discrete solution.
Abstract: In this work we consider a zero sum differential game problem with stopping times. We discretize the associated Isaacs equation and we obtain an approximate solution that converges to the value function of the game when the parameter of discretization tends to zero. We give also an estimate of the error of discretization. The discrete solution of the problem is the fixed point of a contractive operator and we present an accelerated algorithm to find it. We prove that this algorithm converges to the value function in a finite number of steps.

Journal ArticleDOI
TL;DR: Stochastic control has been used in a wide variety of fields spanning cancer research and chemotherapy, economics and finance, engineering, management and many other areas of scientific and social investigation as discussed by the authors.

Journal ArticleDOI
TL;DR: In this paper, the first-order behavior of the optimal value function associated to a convex parametric problem of calculus of variations is studied, and the concepts of approximate Euler-Lagrange inclusion and approximate transversality condition are key ingredients in the writing of sensitivity results.
Abstract: We study the first-order behaviour of the optimal value function associated to a convex parametric problem of calculus of variations. An important feature of this paper is that we do not assume the existence of optimal trajectories for the unperturbed problem. The concepts of approximate Euler-Lagrange inclusion and approximate transversality condition are key ingredients in the writing of our sensitivity results.

Journal ArticleDOI
TL;DR: In this paper, the authors describe some optimisation circuits which incorporate the Bellman-Ford algorithm for solving closed semi-ring problems, with particular reference to the minimum spanning tree problem.
Abstract: The Bellman-Ford algorithm is well known for providing a dynamic programming solution for the shortest path problem. The authors describe some novel optimisation circuits which incorporate the Bellman-Ford algorithm for solving closed semi-ring problems, with particular reference to the minimum spanning tree problem.

Book ChapterDOI
01 Jan 1994
TL;DR: In the previous chapter the authors have described a general approach of optimal control problems by reducing them via maximum principle to a two point boundary value problem associated with the Euler-Lagrange differential system.
Abstract: In the previous chapter we have described a general approach of optimal control problems by reducing them via maximum principle to a two point boundary value problem associated with the Euler-Lagrange differential system. This is often referred to in the literature as the trajectory optimization problem and the optimal control obtained in this way is referred to as an open loop optimal control. The dynamic programming method in control theory is concerned with the concept of feedback which allows to determine the control inputs of the system on the basis of the observations of present state. The use of feedback control is in particular important when the dynamic of system is only partially known due to presence of uncertainty and external disturbances. In this control scheme the optimal value function, i.e., the minimum value of the pay-off considered as a function of initial data, has a central role. This function is a generalized solution to a first order partial differential equation of Hamilton-Jacobi type called the dynamic programming equation.