scispace - formally typeset
Search or ask a question

Showing papers on "Bellman equation published in 1995"


Journal ArticleDOI
TL;DR: It is shown that in general the usual constraint qualifications do not hold and the right constraint qualification is the calmness condition, and it is also shown that the linear bilevel programming problem and the minmax problem satisfy the Calmness condition automatically.
Abstract: The bilevel programming problem (BLPP) is a sequence of two optimization problems where the constraint region of the upper level problem is determined implicitly by the solution set to the lower level problem. To obtain optimality conditions, we reformulate BLPP as a single level mathematical programming problem (SLPP) which involves the value function of the lower level problem. For this mathematical programming problem, it is shown that in general the usual constraint qualifications do not hold and the right constraint qualification is the calmness condition. It is also shown that the linear bilevel programming problem and the minmax problem satisfy the calmness condition automatically. A sufficient condition for the calmness for the bilevel programming problem with quadratic lower level problem and nondegenerate linear complementar¬ity lower level problem are given. First order necessary optimality condition are given using nonsmooth analysis. Second order sufficient optimality conditions are also give...

298 citations


Book
30 Oct 1995
TL;DR: Semi-regenerative decision models as discussed by the authors describe a basic decision model with robust definitions and assumptions, and examples of Controlled Queues Optimization Problems Renewal Kernels of the decision model special classes of strategies Sufficiency of Markov Strategies Dynamic Programming Discounting in Continuous Time Dynamic Programming Equation Bellman Functions Finite-Horizon Problem Infinite-Horzon Discounted-Cost Problem Random-Horzone Problem Average Cost Criterion Preliminaries: Weak Topology, Limit Passages Prelimineurs: Taboo Probabilities, Limit Theorems for Markov Renewal
Abstract: Semi-Regenerative Decision Models Description of Basic Decision Model Rigorous Definitions and Assumptions Examples of Controlled Queues Optimization Problems Renewal Kernels of the Decision Model Special Classes of Strategies Sufficiency of Markov Strategies Dynamic Programming Discounting in Continuous Time Dynamic Programming Equation Bellman Functions Finite-Horizon Problem Infinite-Horizon Discounted-Cost Problem Random-Horizon Problem Average Cost Criterion Preliminaries: Weak Topology, Limit Passages Preliminaries: Taboo Probabilities, Limit Theorems for Markov Renewal Processes Notation, Recurrence-Communication Assumptions, Examples Existence of Optimal Policies Existence of Optimal Strategies: General Criterion Existence of Optimal Strategies: Sufficient Conditions Optimality Equation Constrained Average-Cost Problem Average-Cost Optimality as Limiting Case of Discounted-Cost Optimality Continuously Controlled Markov Jump Processes Facts About Measurability of Stochastic Processes Marked Point Processes and Random Measures The Predictable s-Algebra Dual Predictable Projections of Random Measures Definition of Controlled Markov Jump Process An M/M/1 Queue With Controllable Input and Service Rate Dynamic Programming Optimization Problems Structured Optimization Problems for Decision Processes Convex Regularization Submodular and Supermodular Functions Existence of Monotone Solutions for Optimization Problems Processes with Bounded Drift Birth and Death Processes Control of Arrivals The Model Description Finite-Horizon Discounted-Cost Problem Cost Functionals Infinite-Horizon Case with and without Discounting Optimal Dynamic Pricing Policy: Model Results Control of Service Mechanism Description of the System Static Optimization Problem Optimal Policies for the Queueing Process Service System with Two Interacting Servers Analysis of Optimality Equation Optimal Control in Models with Several Classes of Customers Description of Models and Processes Associated Controlled Processes Existence of Optimal Simple Strategies for the Systems with Alternating Priority Existence of Optimal Simple Strategy for the System with Feedback Equations for Stationary Distributions Stationary Characteristics of the Systems with Alternating Priority Stationary Characteristics of the System with Feedback Models with Alternating Priority: Linear Programming Problem Linear Programming Problem in the Model with Feedback Model with Periods of Idleness and Discounted-Cost Criterion Basic Formulas Construction of Optimal Modified Priority Discipline Bibliography Index Each chapter also includes an Introduction, and a Remarks and Exercises section

177 citations


Journal ArticleDOI
TL;DR: In this paper, the authors apply the compactification method to study the control problem where the state is governed by an Ito stochastic differential equation allowing both classical and singular control.
Abstract: We apply the compactification method to study the control problem where the state is governed by an Ito stochastic differential equation allowing both classical and singular control. The problem is reformulated as a martingale problem on an appropriate canonical space after the relaxed form of the classical control is introduced. Under some mild continuity hypotheses on the data, it is shown by purely probabilistic arguments that an optimal control for the problem exists. The value function is shown to be Borel measurable.

108 citations


Proceedings ArticleDOI
13 Dec 1995
TL;DR: A general, unified framework for hybrid control problems that encompasses several types of hybrid phenomena and several models of hybrid systems is proposed and an existence result was obtained for optimal controls.
Abstract: The authors previously (1994) proposed a general, unified framework for hybrid control problems that encompasses several types of hybrid phenomena and several models of hybrid systems An existence result was obtained for optimal controls The value function associated with this problem satisfies a set of "generalized quasi-variational inequalities" (GQVIs) We give a classification of the types of hybrid systems models covered by our framework and algorithms We review our general framework and results Then, we outline three explicit approaches for computing the solutions to the GQVIs that arise in optimal hybrid control The approaches are generalizations to hybrid systems of shooting methods for boundary value problems, impulse control for piecewise-deterministic processes (PDPs), and value and policy iteration for piecewise-continuous dynamical systems In the central case, we make clear the strong connection between impulse control for PDPs and optimal hybrid control This allows us to give exact and approximate ("epsilon-optimal") algorithms for computing the value function associated with such problems and give some theoretical results Also following previous work, we find that we can compute optimal solutions via linear programming (LP) The resulting LP problems are in general large, but sparse In each case, the underlying feedback controls can be subsequently computed Illustrative examples of each algorithm are solved in our framework

103 citations


Journal ArticleDOI
TL;DR: In this article, the dynamic programming principle for a multidimensional singular stochastic control problem is established for the case when assuming Lipschitz continuity on the data, and the value function is continuous and is the unique viscosity solution of the corresponding Hamilton-Jacobi-Bellman equation.
Abstract: The dynamic programming principle for a multidimensional singular stochastic control problem is established in this paper When assuming Lipschitz continuity on the data, it is shown that the value function is continuous and is the unique viscosity solution of the corresponding Hamilton--Jacobi--Bellman equation

89 citations


Journal ArticleDOI
TL;DR: In this article, the authors studied a dynamic insurance problem with bilateral asymmetric information and balanced budgets and provided sufficient and necessary conditions for the existence of a constrained efficient contract, which can be characterized in a Bellman equation and the long-run distribution of expected utilities of each agent is not degenerate.
Abstract: This paper studies a dynamic insurance problem with bilateral asymmetric information and balanced budgets There are two infinitely-lived agents in our model, both risk averse, and each has an iid random endowment stream which is unobservable to the other In each period, each agent must have a non-negative consumption and together they must consume the entire aggregate endowment Dynamic incentive compatibility in the Nash sense is defined We give sufficient and necessary conditions for the existence of a constrained efficient contract We show that a constrained efficient contract can be characterized in a Bellman equation We demonstrate that the long-run distribution of expected utilities of each agent is not degenerate We also develop an algorithm for computing the efficient contract and, in a numerical example, we find that the consumption processes of the agents form stationary Markov chains

79 citations


Proceedings Article
Ralph Neuneier1
27 Nov 1995
TL;DR: Asset allocation is formalized as a Markovian Decision Problem which can be optimized by applying dynamic programming or reinforcement learning based algorithms and is shown to be equivalent to a policy computed by dynamic programming.
Abstract: In recent years, the interest of investors has shifted to computerized asset allocation (portfolio management) to exploit the growing dynamics of the capital markets. In this paper, asset allocation is formalized as a Markovian Decision Problem which can be optimized by applying dynamic programming or reinforcement learning based algorithms. Using an artificial exchange rate, the asset allocation strategy optimized with reinforcement learning (Q-Learning) is shown to be equivalent to a policy computed by dynamic programming. The approach is then tested on the task to invest liquid capital in the German stock market. Here, neural networks are used as value function approximators. The resulting asset allocation strategy is superior to a heuristic benchmark policy. This is a further example which demonstrates the applicability of neural network based reinforcement learning to a problem setting with a high dimensional state space.

77 citations


Journal ArticleDOI
TL;DR: A deterministic optimal control problem is obtained that is equivalent to the stochastic production planning problem under consideration and derived the optimal feedback control policy in terms of the directional derivatives of the value function.

67 citations


Journal ArticleDOI
TL;DR: It is shown that a unified framework consisting of a sequential diagram, an influence diagram, and a common formulation table for the problem's data, suffices for compact and consistent representation, economical formulation, and efficient solution of (asymmetric) decision problems.
Abstract: In this paper we introduce a new graph, the sequential decision diagram, to aid in modeling formulation, and solution of sequential decision problems under uncertainty. While as compact as an influence diagram, the sequential diagram captures the asymmetric and sequential aspects of decision problems as effectively as decision trees. We show that a unified framework consisting of a sequential diagram, an influence diagram, and a common formulation table for the problem’s data, suffices for compact and consistent representation, economical formulation, and efficient solution of (asymmetric) decision problems. In addition to asymmetry, the framework exploits other sources of computational efficiency, such as conditional independence and value function decomposition, making it also useful in evaluating dynamic-programming problems. The formulation table and recursive algorithm can be readily implemented in computers for solving large-scale problems. Examples are provided to illustrate the methodology in both...

65 citations


Proceedings ArticleDOI
Hideo Nagai1
13 Dec 1995
TL;DR: In this paper, a nonnegative solution to the Bellman equation of risk sensitive control problems is shown, and the result is applied to prove that no breaking down occurs and the relationship between the asymptotics and the large deviation principle is noted.
Abstract: Risk sensitive control problems are considered. Existence of a nonnegative solution to the Bellman equation of risk sensitive control is shown. The result is applied to prove that no breaking down occurs. Asymptotic behaviour of the nonnegative solution is studied in relation to ergodic control problems and the relationship between the asymptotics and the large deviation principle is noted.

63 citations


Journal ArticleDOI
TL;DR: In this article, the authors studied the Markov decision process under the maximization of the probability that total discounted rewards exceed a target level, and studied the dynamic programing equations of the model.
Abstract: The Markov decision process is studied under the maximization of the probability that total discounted rewards exceed a target level. We focus on and study the dynamic programing equations of the model. We give various properties of the optimal return operator and, for the infinite planning-horizon model, we characterize the optimal value function as a maximal fixed point of the previous operator. Various turnpike results relating the finite and infinite-horizon models are also given.

Journal ArticleDOI
TL;DR: In this article, the uniqueness results for viscosity solutions of nonstationary Hamilton-Jacobi-Bellman equations are extended to the robust control limit of risk-sensitive stochastic control problems.
Abstract: This paper extends the uniqueness results for viscosity solutions of nonstationary Hamilton--Jacobi--Bellman equations. The conditions for uniqueness which are obtained can involve a trade-off between the growth of the solution and the growth of the Hamiltonian. In particular, the result is valid for solutions which grow quadratically in the space variable and are associated with Hamiltonians which also grow quadratically. This particular class arises in the robust control limit of risk-sensitive stochastic control problems.

Book ChapterDOI
01 Jan 1995
TL;DR: In this article, an approximation scheme for the value function of general pursuit-evasion games was proposed and proved to be convergent in the case of a single player, even for games with a discontinuous value function.
Abstract: We describe an approximation scheme for the value function of general pursuit-evasion games and prove its convergence, in a suitable sense. The result works for problems with discontinuous value function as well, and it is new even for the case of a single player. We use some very recent results on generalized (viscosity) solutions of the Dirichlet boundary value problem associated to the Isaacs equation, and a suitable variant of Fleming’s notion of value. We test the algorithm on some examples of games in the plane.

Journal ArticleDOI
TL;DR: In this article, the bilevel dynamic optimization problem is reformulated as a single-level optimal control problem that involves the value function of the lower-level problem, and a sensitivity analysis of the upper-level decision variable with respect to the perturbation in the upper level decision variable is given, and first-order necessary optimality conditions are derived by using nonsmooth analysis.
Abstract: In this paper we study the bilevel dynamic optimization problem, which is a hierarchy of two optimization problems where the constraint region of the upper-level problem is determined implicitly by the solution to the lower-level problem and where the upper-level decision variable is a vector while the lower-level decision variable is an admissible control function. To obtain optimality conditions, we reformulate the bilevel dynamic optimization problem as a single-level optimal control problem that involves the value function of the lower-level problem. A sensitivity analysis of the lower-level problem with respect to the perturbation in the upper-level decision variable is given, and the first-order necessary optimality conditions are derived by using nonsmooth analysis.

Journal ArticleDOI
TL;DR: In this paper, the authors studied the convergence of stochastic games with cost functions to the ergodic Bellman equation and showed that the average cost converges with respect to the discount ρ → 0.
Abstract: Stochastic games with cost functionals J ( i ) ρ, x ( v ) = E ∫ ∞ 0 e – ρ t l i ( y, v ) d t , i = 1, 2 with controls v = v 1 , v 2 and state y ( t ) with y (0) = x are considered. Each player wants to minimize his (her) cost functional. E denotes the expected value and the state variables y are coupled with the controls v via a stochastic differential equation with initial value x . The corresponding Bellman system, which is used for the calculation of feedback controls v = v ( y ) and the solvability of the game, leads to a class of diagonal second-order nonlinear elliptic systems, which also occur in other branches of analysis. Their behaviour concerning existence and regularity of solutions is, despite many positive results, not yet well understood, even in the case where the l i , are simple quadratic functions. The objective of this paper is to give new insight to these questions for fixed ρ > 0, and, primarily, to analyse the limiting behaviour as the discount ρ → 0. We find that the modified solutions of the stochastic games converge, for subsequences, to the solution of the so-called ergodic Bellman equation and that the average cost converges. A former restriction of the space dimension has been removed. A reasonable class of quadratic integrands may be treated. More specifically, we consider the Bellman systems of equations – ∆ z + λ = H ( x, Dz ), where the space variable x belongs to a periodic cube (for the sake of simplifying the presentation). They are shown to have smooth solutions. If u ρ is the solution of – ∆ u ρ + ρ u ρ = H ( x, Du ρ ) then the convergence of u ρ — ῡ ρ to z , as ρ tends to 0, is established. The conditions on H are such that some quadratic growth in Du is allowed.

Journal ArticleDOI
TL;DR: The methodology developed in this paper combines fundamental properties of convex sets in order to decompose the multicriterion control problem into a two-level structure, which involves the solution of a relaxed and static multicriteria problem.

Journal ArticleDOI
TL;DR: An algorithm for generating efficient solutions of multiobjective mathematical programming problem based on the principle of optimality in dynamic programming and the basic notion of stability in convex programming problems with parameters in the constraints is defined and analyzed quantitatively for the problem.

Journal ArticleDOI
TL;DR: In this paper, an approximate optimal feedback controller in the form of a feed-forward neural network is proposed, which is capable of approximately minimizing an arbitrary performance index for a nonlinear dynamical system for initial conditions arising from a nontrivial bounded subset of the state space.
Abstract: The solutions of most nonlinear optimal control problems are given in the form of open-loop optimal control which is computed from a given fixed initial condition. Optimal feedback control can in principle be obtained by solving the corresponding Hamilton-Jacobi-Bellman dynamic programming equation, though in general this is a difficult task. We propose a practical and effective alternative for constructing an approximate optimal feedback controller in the form of a feedforward neural network, and we justify this choice by several reasons. The controller is capable of approximately minimizing an arbitrary performance index for a nonlinear dynamical system for initial conditions arising from a nontrivial bounded subset of the state space. A direct training algorithm is proposed and several illustrative examples are given.

Proceedings ArticleDOI
13 Dec 1995
TL;DR: Numerical results are presented dealing with the issue of domestic asset allocation, that is the optimal split between cash, long bonds and equities, and the impact of the transaction costs on the risk return characteristics of the optimal policies is analyzed.
Abstract: This paper considers the optimal investment policy for an investor who has available one bank account paying a fixed interest rate r and n risky assets whose prices are correlated log-normal diffusions. We suppose that transactions between the assets incur a cost proportional to the size of the transaction. The problem is to maximize a function of the total net wealth on a finite horizon. Dynamic programming leads to a parabolic variational inequality for the value function which is solved by using a numerical algorithm based on policies iterations and multigrid methods. Numerical results are presented dealing with the issue of domestic asset allocation, that is the optimal split between cash, long bonds and equities. The impact of the transaction costs on the risk return characteristics of the optimal policies is analyzed.

Journal ArticleDOI
TL;DR: In this article, an approach of hierarchical decision-making in production planning and capacity expansion problems under uncertainty is presented, where the strategic level management can base the capacity decision on aggregated information from the shop floor, and the operational level management, given this decision, can derive a production plan for the system, without too large a loss in optimality when compared to simultaneous determination of optimal capacity and production decisions.

01 Jan 1995
TL;DR: The solvability of a class of forward-backward stochastic differential equations (SDEs for short) over an arbitrarily prescribed time duration is studied in this paper.
Abstract: The solvability of a class of forward-backward stochastic differential equations (SDEs for short) over an arbitrarily prescribed time duration is studied. The authors design a stochastic relaxed control problem, with both drift and difftusion all being controlled, so that the solvability problem is converted to a problem of finding the nodal set of the viscosity solution to a certain Hamilton-Jacobi-Bellman equation. This method overcomes the fatal difficulty encountered in the traditional contraction mapping approach to the existence theorem of such SDEs.

Journal ArticleDOI
TL;DR: In this paper, the authors investigate continuity properties of the minimal point multivalued mapping associated with parametric vector optimization problems in topological vector spaces and prove sufficient conditions for several types of continuities of minimal points and discuss their relationship to the existing results as well to the classical Berge maximum theorem in the case of scalar optimization problems.
Abstract: In the present paper we investigate continuity properties of the minimal point multivalued mapping associated with parametric vector optimization problems in topological vector spaces. This mapping can be viewed as a counterpart of the optimal value function in scalar optimization. We prove sufficient conditions for several types of continuities of minimal points and discuss their relationship to the existing results as well to the classical Berge Maximum Theorems in the case of scalar optimization problems

Journal ArticleDOI
TL;DR: In this paper, a zero-sum game approach is adopted to solve the problem of optimal stopping of the discrete time Markov process by two decision makers in a competitive situation, where the gain function depends on the states chosen by both decision makers.
Abstract: In this paper a problem of optimal stopping of the discrete time Markov process by two decision makers in a competitive situation is considered. The zero-sum game approach is adopted. The gain function depends on the states chosen by both decision makers. The random assignment mechanism is used when both want to accept the realization of the Markov process at the same moment. The construction of the value function and the optimal strategies for the players are given. Examples related to the generalization of the best choice problem are solved.

Posted Content
TL;DR: A discretized version of the dynamic programming algorithm is developed and it is shown that under the proposed scheme the computed value function converging quadratically to the true value function and the computed policy function converges linearly, as the mesh size of the discretization converges to zero.
Abstract: In this paper we develop a discretized version of the dynamic programming algorithm and derive error bounds for the approximate value and policy functions. We show that under the proposed scheme the computed value function converges quadratically to the true value function and the computed policy function converges linearly, as the mesh size of the discretization converges to zero. Moreover, the constants involved in these orders of convergence can be computed in terms of primitive data of the model. We also discuss several aspects of the implementation of our methods, and present numerical results for some commonly studied macroeconomic models.

Journal ArticleDOI
TL;DR: In this article, a feedback law expressed by means of a Trotter product formula approximation of the dynamic programming equation, and which provides approximately optimal controls, is established for the control systems governed by a certain class of variational inequalities of parabolic type.
Abstract: A feedback law expressed by means of a Trotter product formula approximation of the dynamic programming equation, and which provides approximately optimal controls, is established for the control systems governed by a certain class of variational inequalities of parabolic type. To this purpose, two general Lie-Trotter formulas for the dynamic programming equation are proposed, and corresponding convergence results (generalizing previous results of the author) are proved. The framework also includes the control systems described by the parabolic obstacle problem as well as those governed by semilinear parabolic equations.

Proceedings ArticleDOI
21 Jun 1995
TL;DR: The approach of approximating a differential algebraic optimization problem (DAOP) by a nonlinear program (NLP) and subsequently solving it is considered and the minimization of the approximation error by adjusting the collocation points leads to constraining the input space, thereby increasing the minimum predicted cost.
Abstract: The approach of approximating a differential algebraic optimization problem (DAOP) by a nonlinear program (NLP) and subsequently solving it is considered. In this context, the two distinct objectives to be minimized are: (i) the approximation error and (ii) the predicted cost functional. It is first shown that the minimization of the approximation error by adjusting the collocation points leads to constraining the input space, thereby increasing the minimum predicted cost. This is the main motivation to seek compromise solutions and hence the overall problem is approached from a multicriteria optimization viewpoint. Various preference structures (lexicographic, Pareto and value function) available in the multicriteria literature provide an unified framework for the analysis of existing techniques and the methods proposed here.

Journal ArticleDOI
TL;DR: In this article, the minimum time problem associated with a nonlinear control system is considered, and the unicity of the lower semicontinuous solution of the corresponding Bellman equation is investigated.
Abstract: The minimum time problem associated with a nonlinear control system is considered, and the unicity of the lower semicontinuous solution of the corresponding Bellman equation is investigated. A main tool in our approach is the Kruzkov transformation that enables us to work on ℝ n −{0}, where {0} is the target set, instead of the unknown reachable set.

Journal ArticleDOI
Willi Semmler1
TL;DR: In this article, a discrete-time dynamic programming algorithm is proposed to track nonlinearities in intertemporal optimization problems. But, unlike using linearization methods for solving inter-temporal models, the proposed algorithm operates globally by iteratively computing the value function and the controls in feedback form on a grid.
Abstract: The paper presents a discrete-time dynamic programming algorithm that is suitable to track nonlinearities in intertemporal optimization problems. In contrast to using linearization methods for solving intertemporal models, the proposed algorithm operates globally by iteratively computing the value function and the controls in feedback form on a grid. A conjecture of how the trajectories might behave is analytically obtained by letting the discount rate approach infinity. The dynamic found serves as a useful device for computing the trajectories for finite discount rates employing the algorithm. Commencing with a large step and grid size, and then pursuing time step and grid refinements allows for the replication of the nonlinear dynamics for various finite discount rates. As the time step and grid size shrink, the discretization errors vanish. The algorithm is applied to three economic examples. Two examples are of deterministic type; the third is stochastic. In the deterministic cases limit cycles are detected.

Proceedings ArticleDOI
13 Dec 1995
TL;DR: In this paper, a robust adaptive control problem for uncertain linear systems is formulated with the estimator equation and its associated Riccati equation as state variables, and it is shown that a saddle point controller is equivalent to a minimax controller by using the Hamilton-Jacobi-Isaacs equation.
Abstract: Formulates a robust adaptive control problem for uncertain linear systems. For complete linear systems with a quadratic performance index, a minimax controller is easily obtained. The class of systems under consideration has a bilinear structure. Although it allows a finite dimensional estimator, the problem still remains more difficult than the linear-quadratic problem. For these class of systems, the minimax dynamic programming problem is formulated with the estimator equation and its associated Riccati equation as state variables. It is then shown that a saddle point controller is equivalent to a minimax controller by using the Hamilton-Jacobi-Isaacs equation. Since the saddle point optimal return function satisfies the minimax dynamic programming equation, restrictive assumptions on the uniqueness of the worst case state are not required. The authors finally show that with additional assumptions the problem can be extended to the infinite-time problem.

Journal ArticleDOI
TL;DR: In this paper, the authors study a quasi-variational inequality system with unbounded solutions and show that the optimal cost is the unique viscosity solution of the system with state constraints arising from production engineering.
Abstract: We study a quasi-variational inequality system with unbounded solutions It represents the Bellman equation associated with an optimal switching control problem with state constraints arising from production engineering We show that the optimal cost is the unique viscosity solution of the system