scispace - formally typeset
Search or ask a question

Showing papers on "Bellman equation published in 1990"


Journal ArticleDOI
01 Mar 1990
TL;DR: By representing value function separability in the structure of the graph of the influence diagram, formulation is simplified and operations on the model can take advantage of the separability, this allows simple exploitation in the value function of a decision problem.
Abstract: The concept of a super value node is developed to extend the theory of influence diagrams to allow dynamic programming to be performed within this graphical modeling framework. The operations necessary to exploit the presence of these nodes and efficiently analyze the models are developed. The key result is that by representing value function separability in the structure of the graph of the influence diagram, formulation is simplified and operations on the model can take advantage of the separability. From the decision analysis perspective, this allows simple exploitation of separability in the value function of a decision problem. This allows algorithms to be designed to solve influence diagrams that automatically recognize the opportunity for applying dynamic programming. From the decision processes perspective, influence diagrams with super value nodes allow efficient formulation and solution of nonstandard decision process structures. They also allow the exploitation of conditional independence between state variables. >

320 citations


Book ChapterDOI
01 Jan 1990
TL;DR: In this article, the authors present theory, applications, and computational methods for Markov Decision Processes (MDPs) and provide an optimality equation that characterizes the supremal value of the objective function, characterizing the form of an optimal policy, and developing efficient computational procedures for finding policies thatare optimal or close to optimal.
Abstract: Publisher Summary This chapter presents theory, applications, and computational methods for Markov Decision Processes (MDP's). MDP's are a class of stochastic sequential decision processes in which the cost and transition functions depend only on the current state of the system and the current action. These models have been applied in a wide range of subject areas, most notably in queueing and inventory control. A sequential decision process is a model for dynamic system under the control of a decision maker. Sequential decision processes are classified according to the times (epochs) at which decisions are made, the length of the decision making horizon, the mathematical properties of the state and action spaces, and the optimality criteria. The focus of this chapter is problems in which decisions are made periodically at discrete time points. The state and action sets are either finite, countable, compact or Borel; their characteristics determine the form of the reward and transition probability functions. The optimality criteria considered in the chapter include finite and infinite horizon expected total reward, infinite horizon expected total discounted reward, and average expected reward. The main objectives in analyzing sequential decision processes in general and MDP's in particular include (1) providing an optimality equation that characterizes the supremal value of the objective function, (2) characterizing the form of an optimal policy if it exists, (3) developing efficient computational procedures for finding policies thatare optimal or close to optimal. The optimality or Bellman equation is the basic entity in MDP theory and almost all existence, characterization, and computational results are based on its analysis.

132 citations


01 Jan 1990
TL;DR: In this article, a complete theory of optimal control of piecewise deterministic Markov processes under weak assumptions is presented, which consists of a description of the processes, a nonsmooth stochastic maximum principle as a necessary optimality condition, a generalized Bellman-Hamilton-Jacobi necessary and sufficient optimality conditions involving the Clarke generalized gradient, existence results and regularity properties of the value function.
Abstract: This thesis describes a complete theory of optimal control of piecewise deterministic Markov processes under weak assumptions. The theory consists of a description of the processes, a nonsmooth stochastic maximum principle as a necessary optimality condition, a generalized Bellman-Hamilton-Jacobi necessary and sufficient optimality condition involving the Clarke generalized gradient, existence results and regularity properties of the value function. The impulse control problem is transformed to an equivalent optimal dynamic control problem. Cost functions are subject only to growth conditions.

89 citations


Journal ArticleDOI
TL;DR: In this article, the accuracy of two versions of Kydland and Prescott's (1980, 1982) procedure for approximating optimal decision rules in problems in which the objective fails to be quadratic and the constraints fail to be linear.
Abstract: This article studies the accuracy of two versions of Kydland and Prescott's (1980, 1982) procedure for approximating optimal decision rules in problems in which the objective fails to be quadratic and the constraints fail to be linear. The analysis is carried out using a version of the Brock–Mirman (1972) model of optimal economic growth. Although the model is not linear quadratic, its solution can, nevertheless, be computed with arbitrary accuracy using a variant of existing value-function iteration procedures. I find that the Kydland–Prescott approximate decision rules are very similar to those implied by value-function iteration.

75 citations


Journal ArticleDOI
TL;DR: In this article, the problem of routing control in an open queueing network under conditions of heavy traffic and finite (scaled) buffers is dealt with, where an extension of the reflection mapping needs to be obtained.
Abstract: The problem of routing control in an open queueing network under conditions of heavy traffic and finite (scaled) buffers is dealt with. The operating statistics can be state dependent. The sequence of scaled controlled state processes converges to a singularly controlled reflected diffusion (with the associated costs), under broad conditions. Due to the nature of the controls, a “scaling” method is introduced to obtain the convergence, since the actual sequence of processes does not necessarily converge in the Skorokhod topology. Owing to finite buffers, an extension of the reflection mapping needs to be obtained. The optimal value functions for the physical processes converge to the optimal value function of the limit process, under broad conditions. Approximations to the optimal control for the limit process are obtained, as well as properties of the sequence of physical processes. The optimal or controlled (but not necessarily optimal) limit process can be used to approximate a large variety of functio...

70 citations


Journal ArticleDOI
TL;DR: In this article, the concept of biconvergence was introduced, which is a weak and intuitive topological assumption on the utility function and the production function together, and it has been shown that the true value function exists, it is the unique admissible solution to Bellman's equation and it may be calculated numerically as the limit of successive approximations.
Abstract: This paper introduces the concept of biconvergence, which is a weak and intuitive topological assumption on the utility function and the production function together. Concerning recursive utility, we show that, given biconvergence, the utility function is the unique admissible solution to Koopmans' equation. Concerning dynamic programming, we show that, given biconvergence, the true value function exists, it is the unique admissible solution to Bellman's equation, and it may be calculated numerically as the limit of successive approximations. Finally, we develop an overly strong sufficient condition for biconvergence which substantially weakens the Lipschitz condition used by contraction-mapping techniques.

63 citations


Journal ArticleDOI
Xun Yu Zhou1
TL;DR: In this article, the relationship between the maximum principle and the Hamilton-Jacobi-Bellman equation was investigated in the case of deterministic, finite-dimensional systems, by employing the notions of superdifferential and subdifferential introduced by Crandall and Lions.
Abstract: Two major tools for studying optimally controlled systems are Pontryagin's maximum principle and Bellman's dynamic programming, which involve the adjoint function, the Hamiltonian function, and the value function. The relationships among these functions are investigated in this work, in the case of deterministic, finite-dimensional systems, by employing the notions of superdifferential and subdifferential introduced by Crandall and Lions. Our results are essentially non-smooth versions of the classical ones. The connection between the maximum principle and the Hamilton-Jacobi-Bellman equation (in the viscosity sense) is thereby explained by virtue of the above relationship.

62 citations


Journal ArticleDOI
TL;DR: A pseudopolynomial approximation algorithm for bicriteria linear programming using the lower and upper approximation of the optimal value function is given and Numerical results for the bikriteria minimum cost flow problem on NETGEN-generated examples are presented.
Abstract: A subsetS⊂X of feasible solutions of a multicriteria optimization problem is called e-optimal w.r.t. a vector-valued functionf:X→Y $$ \subseteq $$ ℝ K if for allx∈X there is a solutionz x∈S so thatf k(z x)≤(1+e)f k (x) for allk=1,...,K. For a given accuracy e>0, a pseudopolynomial approximation algorithm for bicriteria linear programming using the lower and upper approximation of the optimal value function is given. Numerical results for the bicriteria minimum cost flow problem on NETGEN-generated examples are presented.

51 citations


Journal ArticleDOI
TL;DR: In this article, it is shown that if the solution of the variational problem is smooth enough, the qualitative effects of parameter perturbations on the entire optimal arcs can be represented by a generalized Slutsky-type matrix, which holds in integral form and is symmetric negative semidefinite.
Abstract: autonomous variational calculus problem with a fixed vector of initial stocks, fixed initial and terminal time values, a free vector of terminal stocks, and a time-independent vector of parameters. It is shown that if the solution of the variational problem is smooth enough, the qualitative effects of parameter perturbations on the entire optimal arcs can be represented by a generalized Slutsky-type matrix, which holds in integral form and is symmetric negative semidefinite. Sufficient conditions for the optimal value function to be convex in the parameters are also given.

47 citations


Journal ArticleDOI
TL;DR: In this article, it is shown that the value function is a solution to the Hamilton-Jacobi equation in an extended sense defined in terms of lower Dini directional derivatives, and that solutions of the related inequality furnish verification functions.
Abstract: Hamilton–Jacobi theory provides necessary and sufficient conditions on minimizing arcs in terms of solutions to the Hamilton–Jacobi equation or inequality. The hypotheses under which such results have previously been obtained typically require the data to be continuous in its time-dependence. The present paper lifts this restriction. The basic hypotheses are Caratheodory-type with measurable time and Lipschitz state dependence, and they incorporate the growth condition of Valadier’s existence theory. It is shown that the value function is a solution to the Hamilton–Jacobi equation in an extended sense defined in terms of lower Dini directional derivatives, and that solutions of the related inequality furnish verification functions. Moreover, a characterization of the value function is provided as the pointwise maximum of the family of all verification functions. The methods developed to take account of the measurable time-dependence are based on a “uniform” Lebesgue point theorem for integrably bounded se...

39 citations


Journal ArticleDOI
Xun Yu Zhouf1
TL;DR: In this article, a nonsmooth version of the classical result is established by employing the notions of super-and sub-differential introduced by Crandall and Lions, and the illusory assumption that V is differentiate is dispensed with.
Abstract: There are usually two ways to study optimal stochastic control problems: Pontryagin's maximum principle and Bellman's dynamic programming, involving an adjoint process ψ and the value function V, respectively. The classical result on the connection between the maximum principle and dynamic programming is known as ψ(t)=V x(t,◯(t)) where ◯(∣) is the optimal path. In this paper we establish a nonsmooth version of the classical result by employing the notions of super_ and sub_differential introduced by Crandall and Lions. Thus the illusory assumption that V is differentiate is dispensed with.

Book ChapterDOI
TL;DR: An estimate of the rate of convergence of the approximation scheme for the nonlinear minimum time problem presented in [2] is proved, provided the system have time-optimal controls with bounded variation.
Abstract: In this paper we prove an estimate of the rate of convergence of the approximation scheme for the nonlinear minimum time problem presented in [2]. The estimate holds provided the system have time-optimal controls with bounded variation. This estimate is of order v with respect to the discretization step in time, if the minimal time function is Holder continuous of exponent v. The proof combines the convergence result obtained in [2] by PDE methods, with direct control-theoretic arguments.

Journal ArticleDOI
TL;DR: In this paper, the authors study the Hamilton-Jacobi equation of a system governed either by a semilinear or by a monotone dynamics, replacing the unbounded terms of this equation by their Yosida approximations.

Journal ArticleDOI
G. Barles1
TL;DR: In this paper, it was shown that the value function of a deterministic unbounded control problem is a viscosity solution and the maximum viscoity subsolution of a family of Bellman Equations; in particular, the one given by the hamiltonian, generally discontinuous, associated formally to the problem by analogy with the bounded case.
Abstract: We prove that the value function of a deterministic unbounded control problem is a viscosity solution and the maximum viscosity subsolution of a family of Bellman Equations; in particular, the one given by the hamiltonian, generally discontinuous, associated formally to the problem by analogy with the bounded case. In some cases, we show that this equation is equivalent to a first-order Hamilton-Jacobi Equation with gradient constraints for which we give several existence and uniqueness results. Finally, we indicate other applications of these results to first-order H. J. Equations, to some cheap control problems and to uniqueness results in the nonconvex Calculus of Variations.

Proceedings ArticleDOI
05 Dec 1990
TL;DR: In this paper, a method is introduced for the analysis of the infinite-time risk-sensitive linear quadratic Gaussian (LQG) control problem, based on the theory of large deviations from the invariant measure.
Abstract: A method is introduced for the analysis of the infinite-time risk-sensitive linear quadratic Gaussian (LQG) control problem. A stationary form of the infinite-time cost functional is derived, and optimal conditions in the form of a Bellman equation are derived. A simple solution is presented for the state-feedback case. The relationship between risk-sensitive LQG control and LQG and H-infinity control is explained. The approach is based on the theory of large deviations from the invariant measure. >

Journal ArticleDOI
TL;DR: In this paper, the authors give conditions on the behavior of a controlled diffusion process within and on the boundary of a domain that are sufficient for the value function to have two bounded generalized derivatives and to satisfy the Bellman equation.
Abstract: The author gives conditions on the behavior of a controlled diffusion process within and on the boundary of a domain that are sufficient for the value function to have two bounded generalized derivatives and to satisfy the Bellman equation. These conditions are almost necessary even for uncontrolled diffusion processes, and at the same time they encompass, for example, the heat equation in a disc and the Monge–Ampere equation in a convex domain. Bibliography: 24 titles.

Journal ArticleDOI
TL;DR: In this article, the authors generalize the one-agent growth theory with discounting to the case of several agents with recursive preferences, and show that any Pareto optimum can be viewed as a function of a trajectory of a dynamical system.
Abstract: This article generalizes the one-agent growth theory with discounting to the case of several agents with recursive preferences. In a multi-consumption goods world, we show that, under some regularity conditions, any Pareto optimum can be viewed as a function of a trajectory of a dynamical system. The state space can be chosen to be the product of the space of capitals and the unit simplex. We define and study the properties of generalized value functions.

Book ChapterDOI
01 Jan 1990
TL;DR: In this paper, a geometric approach (symmetries) to dynamic economic problems is presented, which integrates the solution procedure with the economics of the problem and can handle many types of problems with equal ease.
Abstract: This paper presents a geometric approach (symmetries) to dynamic economic problems that integrates the solution procedure with the economics of the problem. Techniques for using symmetries are developed in the contexts of portfolio choice, optimal growth, and dynamic equilibria. Information on preferences, budget sets, and technology is combined to explicitly compute the solution. By focusing on the geometry of the underlying economic structure, the symmetry method can handle many types of problems with equal ease. Given an appropriate economic structure, it is immaterial whether the problem is in continuous or discrete time, is deterministic or stochastic with a Brownian, Poisson or other process, uses a finite or infinite time horizon, or even whether the rate of time preference is fixed or variable. These details are unimportant as long as the geometry is unchanged. All cases are treated in a unified manner.

Journal ArticleDOI
TL;DR: In this article, the authors discuss the properties of solutions to the Bellman's equation of the overtaking criterion and also prove that the optimal trajectory can be represented by a continuous dynamical system.
Abstract: In this note, we discuss the properties of solutions to the Bellman's equation of the overtaking criterion and also prove that the optimal trajectory can be represented by a continuous dynamical system.

Journal ArticleDOI
TL;DR: In this article, the authors apply the Crandall and Lions theory of viscosity solutions for infinite-dimensional Hamilton-Jacobi equations to two problems in distributed control, one governed by differential-difference equations as dynamics, and the other governed by a nonlinear divergence form parabolic equation.
Abstract: We apply the recently developed Crandall and Lions theory of viscosity solutions for infinite-dimensional Hamilton-Jacobi equations to two problems in distributed control. The first problem is governed by differential-difference equations as dynamics, and the second problem is governed by a nonlinear divergence form parabolic equation. We prove a Pontryagin maximum principle in each case by deriving the Bellman equation and using the fact that the value function is a viscosity supersolution.

Journal ArticleDOI
TL;DR: The value function fx = inf I† x, y where the infimum is over all y ∈ ax for some given set-valued map a is investigated and an inner approximation for its generalized gradient is provided.
Abstract: We investigate the value function fx = inf I† x, y where the infimum is over all y ∈ ax for some given set-valued map a. Under specified conditions we provide an inner approximation for its generalized gradient. In some cases a full description of this generalized gradient is given. The results are of use in numerical solution of various optimization problems. Illustrative examples are given.

Journal ArticleDOI
TL;DR: A linear programming framework for computing a quadratic approximation to the value function, which constitutes the off-line computation of a hierarchical FMS scheduling approach previously developed by us, is developed.
Abstract: SUMMARY In this paper, we develop a linear programming framework for computing a quadratic approximation to the value function, which constitutes the off-line computation of a hierarchical FMS scheduling approach previously developed by us. In contrast to previous work, where relatively crude value functions were used, we develop a quadratic approximation that is a prior fit. We consider the multiple part multiple machine discounted cost case and illustrate the approach via a simulation example in the context of an industrial setting.

Journal ArticleDOI
TL;DR: In this paper, the authors present a DP (dynamic programming) formulation of the problem and apply the principle of optimality of DP to find the optimal solution by implicit enumeration.
Abstract: Given a set of n jobs each of which is assigned a due-date and all jobs are simultaneously available to be processed on a single machine, the problem is to find the optimal job processing order that minimizes the sum of absolute deviation of job completion times about their respective due-dates. Since this problem has been shown to be NP-complete, we present a DP (dynamic programming) formulation of the problem and apply the principle of optimality of DP to find the optimal solution by implicit enumeration.

Proceedings ArticleDOI
07 May 1990
TL;DR: The noise performance of the DP-MLE is superior to that of MUSIC at low signal-to-noise ratios and it is concluded that the DP approach is an efficient algorithm for computing the MLEs of a number of sources and their directions of arrival.
Abstract: A dynamic programming (DP) algorithm based on Bellman's principle of optimality (1962) for computing the MLE (maximum likelihood estimation) is proposed. By this algorithm, the multidimensional ML maximization problem is transformed into a recursively one-dimensional maximization problem. The global optimum of the MLE is guaranteed and obtained by simply maximizing the recursive likelihood function. The performance of the DP-MLE is compared to that of MUSIC via simulation. Three sources in the far field emitting plane waves into a linear sensor array of five elements uniformly half a wavelength apart are considered. It is shown that the noise performance of the DP-MLE is superior to that of MUSIC. This superiority is pronounced at low signal-to-noise ratios. It is concluded that the DP approach is an efficient algorithm for computing the MLEs of a number of sources and their directions of arrival. >

Journal ArticleDOI
M. Sun1
TL;DR: In this paper, an optimal stopping-time problem for a diffusion process absorbed at the boundary of a bounded domain in R d (d≥1) is considered, where the cost criterion is of time-average type and may be regarded as a version of the Gittins index.
Abstract: An optimal stopping-time problem for a diffusion process absorbed at the boundary of a bounded domain in R d (d≥1) is considered. The cost criterion is of time-average type and may be regarded as a version of the Gittins index. We characterize the value function and construct an optimal stopping policy using dynamic programming. Two numerical algorithms are used to solve two test problems in R 1 and R 2

Journal ArticleDOI
TL;DR: In this article, a two-player zero-sum linear differential game with a fixed termination time, a convex terminal payoff function, and geometrical constraints on player controls is considered.

Journal ArticleDOI
TL;DR: In this article, a new class of iterative algorithms for solving undiscounted Bellman equations is proposed, which are proved to be particularly useful in handling "degenerate" equations.
Abstract: A new class of iterative algorithms for solving undiscounted Bellman equations is proposed in this article, Such algorithms are proved to be particularly useful in handling “degenerate” equations. For simplicity of presentation and for motivation of basic ideas, the algorithms are introduced in terms of three control problems, one optimal stopping time problem and two singular stochastic control problems. The convergence and rates of convergence are obtained. These results are based on the introduction of a performance index of approximation, which makes new algorithms differ from the existing ones for solving regular undiscounted Bellman equations in the control literature.

Journal ArticleDOI
TL;DR: In this article, the authors interpret extremals in terms of generalized gradients of the value function V by demonstrating that p (·) can in addition be chosen to satisfy (p(t) · x ∗ (t), − p(t)) ϵ ∂V(t, x ∆ (t)), a.e.

01 Jan 1990
TL;DR: In this article, the authors provide several characterizations of optimal trajectories for the classical Mayer problem in optimal control and derive the upper semicontinuity of the optimal feedback map.
Abstract: We provide several characterizations of optimal trajectories for the classical Mayer problem in optimal control. For this purpose we study the regularity of directional derivatives of the value function: for in- stance we show that for smooth control systems the value function V is continuously differentiable along an optimal trajectory z : (to, 11 -+ R" provided V is differentiable at the initial point (to, z(t0)). Then we deduce the upper semicontinuity of the optimal feedback map. We also address the problem of optimal design, obtaining suf- ficient conditions for optimality. Finally we show that the optimal control problem may be reduced to a viability one. ~~ ~~~ ~

Journal ArticleDOI
TL;DR: In this article, a class of control problems whose (uncertain) mathematical model is given by a linear differential equation with control parameters and perturbation parameters influencing the dynamics of an object under consideration is considered.