scispace - formally typeset
Search or ask a question

Showing papers on "Bellman equation published in 1998"


Journal ArticleDOI
TL;DR: This work introduces a mathematical model of hybrid systems as interacting collections of dynamical systems, evolving on continuous-variable state spaces and subject to continuous controls and discrete transitions, and develops a theory for synthesizing hybrid controllers for hybrid plants in all optimal control framework.
Abstract: We propose a very general framework that systematizes the notion of a hybrid system, combining differential equations and automata, governed by a hybrid controller that issues continuous-variable commands and makes logical decisions. We first identify the phenomena that arise in real-world hybrid systems. Then, we introduce a mathematical model of hybrid systems as interacting collections of dynamical systems, evolving on continuous-variable state spaces and subject to continuous controls and discrete transitions. The model captures the identified phenomena, subsumes previous models, yet retains enough structure to pose and solve meaningful control problems. We develop a theory for synthesizing hybrid controllers for hybrid plants in all optimal control framework. In particular, we demonstrate the existence of optimal (relaxed) and near-optimal (precise) controls and derive "generalized quasi-variational inequalities" that the associated value function satisfies. We summarize algorithms for solving these inequalities based on a generalized Bellman equation, impulse control, and linear programming.

1,363 citations


Journal ArticleDOI
TL;DR: The emphasis on methods based on upper and lower estimates of the objective function of the perturbed problems allow one to compute expansions of the optimal value function and approximate optimal solutions in situations where the set of Lagrange multipliers is not a singleton, may be unbounded, or is even empty.
Abstract: This paper presents an overview of some recent, and significant, progress in the theory of optimization problems with perturbations. We put the emphasis on methods based on upper and lower estimates of the objective function of the perturbed problems. These methods allow one to compute expansions of the optimal value function and approximate optimal solutions in situations where the set of Lagrange multipliers is not a singleton, may be unbounded, or is even empty. We give rather complete results for nonlinear programming problems and describe some extensions of the method to more general problems. We illustrate the results by computing the equilibrium position of a chain that is almost vertical or horizontal.

340 citations


Journal ArticleDOI
TL;DR: Monotonicity results for optimal policies of various queueing and resource sharing models are studied by concentrating on the events and the form of the value function instead of on thevalue function itself.
Abstract: In this paper we study monotonicity results for optimal policies of various queueing and resource sharing models. The standard approach is to propagate, for each specific model, certain properties of the dynamic programming value function. We propose a unified treatment of these models by concentrating on the events and the form of the value function instead of on the value function itself. This is illustrated with the systematic treatment of one and two-dimensional models.

169 citations


Journal ArticleDOI
TL;DR: A bound on the estimate error as a function of the disturbance energy is obtained and the corresponding dynamic programming equation is a first-order PDE.

156 citations


Journal ArticleDOI
TL;DR: The authors develop a discretized version of the dynamic programming algorithm and study its convergence and stability properties, showing that the computed value function converges quadratically to the true value function and that the compute value function agrees linearly with the mesh size of the discretization.
Abstract: In this paper we develop a discretized version of the dynamic programming algorithm and study its convergence and stability properties. We show that the computed value function converges quadratically to the true value function and that the computed policy function converges linearly, as the mesh size of the discretization converges to zero; further, the algorithm is stable. We also discuss several aspects of the implementation of our procedures as applied to some commonly studied growth models.

116 citations


Dissertation
01 Jan 1998
TL;DR: This thesis provides an analysis (a proof of convergence, together with bounds on approximation error) of temporal-difference learning in the context of autonomous (uncontrolled) systems as applied to the approximation of an infinite horizon discounted rewards and average and differential rewards.
Abstract: In principle, a wide variety of sequential decision problems--ranging from dynamic resource allocation in telecommunication networks to financial risk management--can be formulated in terms of stochastic control and solved by the algorithms of dynamic programming Such algorithms compute and store a value function, which evaluates expected future reward as a function of current state Unfortunately, exact computation of the value function typically requires time and storage that grow proportionately with the number of states, and consequently, the enormous state spaces that arise in practical applications render the algorithms intractable In this thesis, we study tractable methods that approximate the value function Our work builds on research in an area of artificial intelligence known as reinforcement learning A point of focus of this thesis is temporal-difference learning--a stochastic algorithm inspired to some extent by phenomena observed in animal behavior Given a selection of basis functions, the algorithm updates weights during simulation of the system such that the weighted combination of basis functions ultimately approximates a value function We provide an analysis (a proof of convergence, together with bounds on approximation error) of temporal-difference learning in the context of autonomous (uncontrolled) systems as applied to the approximation of (1) infinite horizon discounted rewards and (2) average and differential rewards As a special case of temporal-difference learning in a context involving control, we propose variants of the algorithm that generate approximate solutions to optimal stopping problems We analyze algorithms designed for several problem classes: (1) optimal stopping of a stationary mixing process with an infinite horizon and discounted rewards; (2) optimal stopping of an independent increments process with an infinite horizon and discounted rewards; (3) optimal stopping with a finite horizon and discounted rewards; (4) a zero-sum two-player stopping game with an infinite horizon and discounted rewards We also present a computational case study involving a complex optimal stopping problem that is representative of those arising in the financial derivatives industry In addition to algorithms for tuning basis function weights, we study an approach to basis function generation In particular, we explore the use of "scenarios" that are representative of the range of possible events in a system Each scenario is used to construct a basis function that maps states to future rewards contingent on the future realization of the scenario We derive, in the context of autonomous systems, a bound on the number of "representative scenarios" that suffices for uniformly accurate approximation of the value function The bound exhibits a dependence on a measure of "complexity" of the system that can often grow at a rate much slower that the state space size (Copies available exclusively from MIT Libraries Rm 14-0551, Cambridge, MA 02139-4307 Ph 617-253-5668; Fax 617-253-1690)

99 citations


Journal ArticleDOI
TL;DR: In this paper, a strong comparison result for the Bellman equation arising in stochastic exit time control problems and its applications is presented, with a focus on the application of Partial Differential Equations (PDE).
Abstract: (1998). A strong comparison result for the bellman equation arising in stochastic exit time control problems and its applications. Communications in Partial Differential Equations: Vol. 23, No. 11-12, pp. 1995-2033.

85 citations


Journal ArticleDOI
TL;DR: In this paper, the ergodic problem for the first-order Hamilton-Jacobi-Equations (HJBs) was studied from the view of controllabilities of underlying controlled deterministic systems.
Abstract: We study the ergodic problem for the first-order Hamilton-Jacobi-Equations (HJBs), from the view point of controllabilities of underlying controlled deterministic systems We shall give sufficient conditions for the ergodicity by the estimates of controllabilities Next, we shall give some results on the Abelian-Tauberian problem for the solutions of HJBs Our solutions of HJBs satisfy the equations in the sense of viscosity solutions

82 citations


Journal ArticleDOI
TL;DR: In this paper, the authors prove that the value function of an optimal stopping problem is the unique viscosity solution of the associated variational inequalities, which can be used to solve optimal stopping problems where the high contact (smooth fit) principle does not necessarily hold.
Abstract: We prove that the value function of an optimal stopping problem is the unique viscosity solution of the associated variational inequalities We illustrate by an example how this can be used to solve optimal stopping problems where the high contact (smooth fit) principle does not necessarily hold

69 citations


Journal ArticleDOI
TL;DR: In this paper, the authors give a necessary and sufficient condition for the existence of control that keeps the corresponding trajectory of the related stochastic control system within a prescribed closed subset of the state space.
Abstract: In this Note, we give a necessary and sufficient condition for the existence of control that keeps the corresponding trajectory of the related stochastic control system within a prescribed closed subset of the state space. The problem of existence of stochastic control under a state-constraint is also called the viability property of the underlying control system. Our result is: the square of the distance function of this constraint is a viscosity supersolution of a Hamilton-Jacobi-Bellman equation if and only if the system enjoys the viability property.

69 citations


Journal ArticleDOI
TL;DR: In this article, a numerical algorithm for the computation of the optimal control for the linear quadratic regulator problem with a positivity constraint on the admissible control set is presented, and sufficient conditions for optimality are presented in terms of inner products, projections on closed convex sets, Pontryagin's maximum principle and dynamic programming.
Abstract: In this paper, the Linear Quadratic Regulator Problem with a positivity constraint on the admissible control set is addressed. Necessary and sufficient conditions for optimality are presented in terms of inner products, projections on closed convex sets, Pontryagin's maximum principle and dynamic programming. The main results are concerned with smoothness of the optimal control and the value function. The maximum principle will be extended to the infinite horizon case. Based on these analytical methods, we propose a numerical algorithm for the computation of the optimal controls for the finite and infinite horizon problem. The numerical methods will be justified by convergence properties between the finite and infinite horizon case on one side and discretized optimal controls and the true optimal control on the other.

Journal ArticleDOI
TL;DR: This paper establishes, in particular, a calculation approach for the value function of the CMDP based on finite state approximation, and presents another type of LP that allows the computation of optimal mixed stationary-deterministic policies.
Abstract: The aim of this paper is to investigate the Lagrangian approach and a related Linear Programming (LP) that appear in constrained Markov decision processes (CMDPs) with a countable state space and total expected cost criteria (of which the expected discounted cost is a special case). We consider transient MDPs and MDPs with uniform Lyapunov functions, and obtain for these an LP which is the dual of another one that has been shown to provide the optimal values and stationary policies [3, 4]. We show that there is no duality gap between these LPs under appropriate conditions. In obtaining the Linear Program for the general transient case, we establish, in particular, a calculation approach for the value function of the CMDP based on finite state approximation. Unlike previous approaches for state approximations for CMDPs (most of which were derived for the contracting framework), we do not need here any Slater type condition. We finally present another type of LP that allows the computation of optimal mixed stationary-deterministic policies.

Posted Content
TL;DR: The main result is to show that in a multiagent economy, the problem of determining efficient allocations can be characterized in terms of a single value function (that of a social planner) rather than multiple functions, as has been proposed thus far.
Abstract: In this article, our objective is to determine efficient allocations in economies with multiple agents having recursive utility functions. Our main result is to show that in a multiagent economy, the problem of determining efficient allocations can be characterized in terms of a single value function (that of a social planner), rather than multiple functions (one for each investor), as has been proposed thus far (Duffie, Geoffard and Skiadas (1994)). We then show how the single value function can be identified using the familiar technique of stochastic dynamic programming. We achieve these goals by first extending to a stochastic environment Geoffard's (1996) concept of variational utility and his result that variational utility is equivalent to recursive utility, and then using these results to characterize allocations in a multiagent setting.

Journal ArticleDOI
TL;DR: In this article, the value function for the problem is shown to be the fixed point of an appropriately defined operator, which is a variation of standard dynamic programming techniques, and it is shown how to reduce this class of problems to a simple variation of regular programming techniques.

Proceedings ArticleDOI
16 Dec 1998
TL;DR: It is argued that a natural choice for the initial value function is the value function for the associated deterministic control problem based upon a fluid model, or the approximate solution to Poisson’s equation obtained from the LP of Kumar and Meyn.
Abstract: This paper considers in parallel the scheduling problem for multiclass queueing networks, and optimization of Markov decision processes. It is shown that the value iteration algorithm may perform poorly when the algorithm is not initialized properly. The algorithm is initialized with a stochastic Lyapunov function, then convergence is guaranteed, and each policy is stabilized. For the network scheduling problem it is argued that a natural choice for the initial value function is the value function for the associated deterministic control problem based upon a fluid model, or the approximate solution to Poisson's equation obtained from the LP of Kumar and Meyn (1996). Numerical studies show that either choice may lead to fast convergence to an optimal policy.

Journal ArticleDOI
TL;DR: In this paper, the Bellman equation of the risk sensitive control problem with full observation is considered, and it appears as an example of a quasi-linear parabolic equation in the whole space and fairly general growth assumptions with respect to the space variable x are permitted.
Abstract: The Bellman equation of the risk-sensitive control problem with full observation is considered. It appears as an example of a quasi-linear parabolic equation in the whole space, and fairly general growth assumptions with respect to the space variable x are permitted. The stochastic control problem is then solved, making use of the analytic results. The case of large deviation with small noises is then treated, and the limit corresponds to a differential game.

Journal ArticleDOI
TL;DR: It is shown that the control Lyapunov function guaranteeing closed-loop stability is a solution to the steady-state Bellman equation for the controlled system and thus guarantees both optimality and stability.
Abstract: In this paper we develop an optimality-based framework for designing controllers for discrete-time non-linear cascade systems. Specifically, using a non-linear—non-quadratic optimal control framework we develop a family of globally stabilizing backstepping-type controllers parameterized by the cost functional that is minimized. Furthermore, it is shown that the control Lyapunov function guaranteeing closed-loop stability is a solution to the steady-state Bellman equation for the controlled system and thus guarantees both optimality and stability.

Journal ArticleDOI
TL;DR: In this article, a piecewise deterministic Markov model for the control of dividend pay-out and reinsurance is introduced, where only the jumps but not the deterministic flow can be controlled.
Abstract: Dynamic programming for piecewise deterministic Markov processes is studied where only the jumps but not the deterministic flow can be controlled. Then one can dispense with relaxed controls. There exists an optimal stationary policy of feedback form. Further, a piecewise deterministic Markov model for the control of dividend pay-out and reinsurance is introduced. This model can be transformed to a model with uncontrolled flow. It is shown that a classical solution to the Bellman equation exists and that a non-relaxed optimal policy of feedback form can be obtained via the Bellman equation. Lipschitz continuity of the one-dimensional vector field defining the controlled flow will be replaced by strict positivity.

Journal ArticleDOI
TL;DR: It is shown that there exists a unique viscosity solution in the class of solutions meeting a certain growth condition, and a representation in terms of available storage is obtained.
Abstract: The dynamic programming equation (DPE) corresponding to nonlinear H∞ control is considered. When the cost grows quadratically in the state, it is well known that there may be an infinite number of viscosity solutions to the DPE. In fact, there may be more than one classical solution when a classical solution exists. For the case of fixed feedback control, it is shown that there exists a unique viscosity solution in the class of solutions meeting a certain growth condition, and a representation in terms of available storage is obtained. For the active control case, where the H∞ problem is represented by a differential game, a similar representation result is obtained under the assumption of existence of a suboptimal feedback control.

Journal ArticleDOI
TL;DR: In this article, the authors consider nonlinear optimal control problems with state constraints and nonnegative cost in infinite dimensions, where the constraint is a closed set possibly with empty interior for a class of systems with a maximal monotone operator and satisfying certain stability properties.
Abstract: We consider nonlinear optimal control problems with state constraints and nonnegative cost in infinite dimensions, where the constraint is a closed set possibly with empty interior for a class of systems with a maximal monotone operator and satisfying certain stability properties of the set of trajectories that allow the value function to be lower semicontinuous. We prove that the value function is a viscosity solution of the Bellman equation and is in fact the minimal nonnegative supersolution.

Journal ArticleDOI
TL;DR: In this paper, the Clarke results on generalized gradients were used to prove that the value function has left and right derivatives with respect to the initial capital stock, without requiring supermodularity assumptions.
Abstract: We consider an optimal growth (multi-sector) model with nonconvex technology. Using the Clarke results on generalized gradients, we prove that the value function has left and right derivatives with respect to the initial capital stock, without requiring supermodularity assumptions.

Book ChapterDOI
01 Jan 1998
TL;DR: In the development of learning systems and neural networks, the issue of complexity occurs at many levels of analysis.
Abstract: In the development of learning systems and neural networks, the issue of complexity occurs at many levels of analysis.

Journal ArticleDOI
TL;DR: It is shown that the Lagrangean function corresponding to any pair of primal and dual optimal solutions forms a linear support to the optimal value function, thus extending the shadow price interpretation of an optimal dual solution to the infinite dimensional case.
Abstract: We consider the class of linear programs that can be formulated with infinitely many variables and constraints but where each constraint has only finitely many variables. This class includes virtually all infinite horizon planning problems modeled as infinite stage linear programs. Examples include infinite horizon production planning under time-varying demands and equipment replacement under technological change. We provide, under a regularity condition, conditions that are both necessary and sufficient for strong duality to hold. Moreover we show that, under these conditions, the Lagrangean function corresponding to any pair of primal and dual optimal solutions forms a linear support to the optimal value function, thus extending the shadow price interpretation of an optimal dual solution to the infinite dimensional case. We illustrate the theory through an application to production planning under time-varying demands and costs where strong duality is established.

Posted Content
TL;DR: McGrattan as mentioned in this paper describes the weighted residual method and the finite element method: both can be used to approximate value functions, when it is impossible to derive the analytical solution.
Abstract: This code supports the text in Ellen McGrattan, Application of Weighted Residual Methods to Dynamic Economic Models, in Ramon Marimon and Andrew Scott (eds), Computational Methods for the Study of Dynamic Economies, Chapter 6, Oxford University Press. This chapter describes the weighted residual method and the finite element method: Both can be used to approximate value functions, when it is impossible to derive the analytical solution. (This can happen when the Bellman equation is complicated).

Proceedings ArticleDOI
16 Dec 1998
TL;DR: In this paper, the H/sub /spl infin// problem for nonlinear systems is considered and the corresponding dynamic programming equation is a fully nonlinear, first-order, partial differential equation.
Abstract: The H/sub /spl infin// problem for nonlinear systems is considered. The corresponding dynamic programming equation is a fully nonlinear, first-order, partial differential equation. Interestingly, if one switches from the normal definition of addition and multiplication to the max-plus algebra (which is no more complex), the solution operator becomes a linear operator. The solution can be expanded using a max-plus basis. The coefficients in this expansion satisfy a max-plus eigenvector equation for a matrix associated with this solution operator-thus transforming the nonlinear problem into a linear one. In fact there is a parameterized family of matrices for which this holds. Expressions and approximations for the coefficients in these matrices are given.

Book ChapterDOI
01 Jan 1998
TL;DR: This paper studied parametric convex lexicographic optimization problems with two objectives using basic tools of convex analysis and point-to-set topology, finding conditions for continuity of the optimal value function, giving characterizations of global and local optima, and formulate a Lagrangian duality theory.
Abstract: We study parametric convex lexicographic optimization problems with two objectives Using basic tools of convex analysis and point-to-set topology, we find conditions for continuity of the optimal value function, give characterizations of global and local optima, and formulate a Lagrangian duality theory These results are readily applicable to bilevel convex programs

01 Jan 1998
TL;DR: In this article, the authors considered the H, problem for nonlinear systems and showed that if one switches from the normal definition of addition and multiplication to the max-plus algebra (which is no more complex), the solution operator becomes a linear operator.
Abstract: The H, problem for nonlinear systems is considered. The corresponding dynamic programming equation is a fully nonlinear, first-order, partial differential equation. Interestingly, if one switches from the normal definition of addition and multiplication to the maxplus algebra (which is no more complex), the solution operator becomes a linear operator. The sciution can be expanded using a max-plus basis. The coefficients in this expansion satisfy a max-plus eigenvector equation for a matrix associated with this solution operator - thus transforming the nonlinear problem into a linear one. In fact there is a parameterized family of matrices for which this holds. Expressions and approximations for the coefficients in these matrices are given.

Book ChapterDOI
01 Jan 1998
TL;DR: In this article, the authors considered a Mayer optimal control problem for a system governed by a semilinear evolution equation of parabolic type, and showed that it is NP-hard.
Abstract: We consider a Mayer optimal control problem for a system governed by a semilinear evolution equation of parabolic type.

Journal ArticleDOI
TL;DR: The problem of identifying the input of a system governed by a "semi-linear" evolution equation of parabolic type, based on the results of observations subject to undefined disturbances, is investigated in this article.

Journal ArticleDOI
TL;DR: In this article, the authors studied the problem of reaching a closed target with trajectories of the system, where a controllability condition around the target allows us to construct a path that steers each point nearby into it in finite time and using a finite amount of energy.
Abstract: Given a control system (formulated as a nonconvex and unbounded differential inclusion) we study the problem of reaching a closed target with trajectories of the system. A controllability condition around the target allows us to construct a path that steers each point nearby into it in finite time and using a finite amount of energy. In applications to minimization problems, limits of such trajectories could be discontinuous. We extend the inclusion so that all the trajectories of the extension can be approached by (graphs of) solutions of the original system. In the extended setting the value function of an exit time problem with Lagrangian affine in the unbounded control can be shown to coincide with the value function of the original problem, to be continuous and to be the unique (viscosity) solution of a Hamilton-Jacobi equation with suitable boundary conditions.