Showing papers on "Bellman equation published in 2001"

PDF

Open Access

Journal Article•DOI•

Regression methods for pricing complex American-style options

[...]

John N. Tsitsiklis¹, B. Van Roy²•Institutions (2)

Massachusetts Institute of Technology¹, Stanford University²

01 Jul 2001-IEEE Transactions on Neural Networks

TL;DR: A simulation-based approximate dynamic programming method for pricing complex American-style options, with a possibly high-dimensional underlying state space, and a related method which uses a single (parameterized) value function, which is a function of the time-state pair.

...read moreread less

Abstract: We introduce and analyze a simulation-based approximate dynamic programming method for pricing complex American-style options, with a possibly high-dimensional underlying state space. We work within a finitely parameterized family of approximate value functions, and introduce a variant of value iteration, adapted to this parametric setting. We also introduce a related method which uses a single (parameterized) value function, which is a function of the time-state pair, as opposed to using a separate (independently parameterized) value function for each time. Our methods involve the evaluation of value functions at a finite set, consisting of "representative" elements of the state space. We show that with an arbitrary choice of this set, the approximation error can grow exponentially with the time horizon (time to expiration). On the other hand, if representative states are chosen by simulating the state process using the underlying risk-neutral probability distribution, then the approximation error remains bounded.

...read moreread less

695 citations

Proceedings Article•

Multiagent Planning with Factored MDPs

[...]

Carlos Guestrin¹, Daphne Koller¹, Ronald Parr²•Institutions (2)

Stanford University¹, Duke University²

03 Jan 2001

TL;DR: This work presents a principled and efficient planning algorithm for cooperative multiagent dynamic systems that avoids the exponential blowup in the state and action space and is an efficient alternative to more complicated algorithms even in the single agent case.

...read moreread less

Abstract: We present a principled and efficient planning algorithm for cooperative multiagent dynamic systems. A striking feature of our method is that the coordination and communication between the agents is not imposed, but derived directly from the system dynamics and function approximation architecture. We view the entire multiagent system as a single, large Markov decision process (MDP), which we assume can be represented in a factored way using a dynamic Bayesian network (DBN). The action space of the resulting MDP is the joint action space of the entire set of agents. Our approach is based on the use of factored linear value functions as an approximation to the joint value function. This factorization of the value function allows the agents to coordinate their actions at runtime using a natural message passing scheme. We provide a simple and efficient method for computing such an approximate value function by solving a single linear program, whose size is determined by the interaction between the value function structure and the DBN. We thereby avoid the exponential blowup in the state and action space. We show that our approach compares favorably with approaches based on reward sharing. We also show that our algorithm is an efficient alternative to more complicated algorithms even in the single agent case.

...read moreread less

479 citations

Journal Article•DOI•

Learning to trade via direct reinforcement

[...]

John Moody¹, Matthew Saffell¹•Institutions (1)

Oregon Health & Science University¹

01 Jul 2001-IEEE Transactions on Neural Networks

TL;DR: It is demonstrated how direct reinforcement can be used to optimize risk-adjusted investment returns (including the differential Sharpe ratio), while accounting for the effects of transaction costs.

...read moreread less

Abstract: We present methods for optimizing portfolios, asset allocations, and trading systems based on direct reinforcement (DR). In this approach, investment decision-making is viewed as a stochastic control problem, and strategies are discovered directly. We present an adaptive algorithm called recurrent reinforcement learning (RRL) for discovering investment policies. The need to build forecasting models is eliminated, and better trading performance is obtained. The direct reinforcement approach differs from dynamic programming and reinforcement algorithms such as TD-learning and Q-learning, which attempt to estimate a value function for the control problem. We find that the RRL direct reinforcement framework enables a simpler problem representation, avoids Bellman's curse of dimensionality and offers compelling advantages in efficiency. We demonstrate how direct reinforcement can be used to optimize risk-adjusted investment returns (including the differential Sharpe ratio), while accounting for the effects of transaction costs. In extensive simulation work using real financial data, we find that our approach based on RRL produces better trading strategies than systems utilizing Q-learning (a value function method). Real-world applications include an intra-daily currency trader and a monthly asset allocation system for the S&P 500 Stock Index and T-Bills.

...read moreread less

396 citations

Journal Article•DOI•

A solution approach to valuation with unhedgeable risks

[...]

Thaleia Zariphopoulou¹•Institutions (1)

University of Wisconsin-Madison¹

01 Jan 2001-Finance and Stochastics

TL;DR: A class of stochastic optimization models of expected utility in markets with stochastically changing investment opportunities is studied, which expresses the value function in terms of the solution of a linear parabolic equation, with the power exponent depending only on the coefficients of correlation and risk aversion.

...read moreread less

Abstract: We study a class of stochastic optimization models of expected utility in markets with stochastically changing investment opportunities. The prices of the primitive assets are modelled as diffusion processes whose coefficients evolve according to correlated diffusion factors. Under certain assumptions on the individual preferences, we are able to produce reduced form solutions. Employing a power transformation, we express the value function in terms of the solution of a linear parabolic equation, with the power exponent depending only on the coefficients of correlation and risk aversion. This reduction facilitates considerably the study of the value function and the characterization of the optimal hedging demand. The new results demonstrate an interesting connection with valuation techniques using stochastic differential utilities and also, with distorted measures in a dynamic setting.

...read moreread less

367 citations

Proceedings Article•

Symbolic dynamic programming for first-order MDPs

[...]

Craig Boutilier¹, Raymond Reiter¹, Bob Price²•Institutions (2)

University of Toronto¹, University of British Columbia²

04 Aug 2001

TL;DR: This technique uses an MDP whose dynamics is represented in a variant of the situation calculus allowing for stochastic actions and produces a logical description of the optimal value function and policy by constructing a set of first-order formulae that minimally partition state space according to distinctions made by the valuefunction and policy.

...read moreread less

Abstract: We present a dynamic programming approach for the solution of first-order Markov decisions processes. This technique uses an MDP whose dynamics is represented in a variant of the situation calculus allowing for stochastic actions. It produces a logical description of the optimal value function and policy by constructing a set of first-order formulae that minimally partition state space according to distinctions made by the value function and policy. This is achieved through the use of an operation known as decision-theoretic regression. In effect, our algorithm performs value iteration without explicit enumeration of either the state or action spaces of the MDP. This allows problems involving relational fluents and quantification to be solved without requiring explicit state space enumeration or conversion to propositional form.

...read moreread less

262 citations

Journal Article•DOI•

Dynamic optimization for reachability problems

[...]

Alexander B. Kurzhanski, Pravin Varaiya

01 Feb 2001-Journal of Optimization Theory and Applications

TL;DR: In this article, the authors use dynamic programming techniques to describe reach sets and related problems of forward and backward reachability, which are reformulated in terms of optimization problems solved through the Hamilton-Jacobi-Bellman Equations.

...read moreread less

Abstract: This paper uses dynamic programming techniques to describe reach sets andrelated problems of forward and backward reachability The original problemsdo not involve optimization criteria and are reformulated in terms ofoptimization problems solved through the Hamilton–Jacobi–Bellmanequations The reach sets are the level sets of the value function solutionsto these equations Explicit solutions for linear systems with hard boundsare obtained Approximate solutions are introduced and illustrated forlinear systems and for a nonlinear system similar to that of theLotka–Volterra type

...read moreread less

152 citations

Proceedings Article•

Max-norm projections for factored MDPs

[...]

Carlos Guestrin¹, Daphne Koller¹, Ronald Parr²•Institutions (2)

Stanford University¹, Duke University²

04 Aug 2001

TL;DR: This paper presents the first approximate MDP solution algorithms - both value and policy iteration - that use max-norm projection, thereby directly optimizing the quantity required to obtain the best error bounds.

...read moreread less

Abstract: Markov Decision Processes (MDPs) provide a coherent mathematical framework for planning under uncertainty. However, exact MDP solution algorithms require the manipulation of a value function, which specifies a value for each state in the system. Most real-world MDPs are too large for such a representation to be feasible, preventing the use of exact MDP algorithms. Various approximate solution algorithms have been proposed, many of which use a linear combination of basis functions as a compact approximation to the value function. Almost all of these algorithms use an approximation based on the (weighted) L2-norm (Euclidean distance); this approach prevents the application of standard convergence results for MDP algorithms, all of which are based on max-norm. This paper makes two contributions. First, it presents the first approximate MDP solution algorithms - both value and policy iteration - that use max-norm projection, thereby directly optimizing the quantity required to obtain the best error bounds. Second, it shows how these algorithms can be applied efficiently in the context of factored MDPs, where the transition model is specified using a dynamic Bayesian network.

...read moreread less

141 citations

Posted Content•

Correspondence principle for idempotent calculus and some computer applications

[...]

Grigori Litvinov, Victor Pavlovich Maslov

03 Jan 2001-arXiv: General Mathematics

TL;DR: The correspondence principle is used to develop an approach to object-oriented software and hardware design for algorithms of idempotent calculus.

...read moreread less

Abstract: This paper is devoted to heuristic aspects of the so-called idempotent calculus There is a correspondence between important, useful and interesting constructions and results over the field of real (or complex) numbers and similar constructions and results over idempotent semirings in the spirit of N Bohr's correspondence principle in Quantum Mechanics Some problems nonlinear in the traditional sense (for example, the Bellman equation and its generalizations) turn out to be linear over a suitable semiring; this linearity considerably simplifies the explicit construction of solutions The theory is well advanced and includes, in particular, new integration theory, new linear algebra, spectral theory and functional analysis It has a wide range of applications Besides a survey of the subject, in this paper the correspondence principle is used to develop an approach to object-oriented software and hardware design for algorithms of idempotent calculus

...read moreread less

125 citations

Journal Article•DOI•

Optimal portfolio selection with consumption and nonlinear integro-differential equations with gradient constraint: a viscosity solution approach

[...]

Fred Espen Benth¹, Kenneth H. Karlsen², Kristin Reikvam¹•Institutions (2)

University of Oslo¹, University of Bergen²

01 Jul 2001-Finance and Stochastics

TL;DR: This work characterization the value function of the singular stochastic control problem as the unique constrained viscosity solution of the associated Hamilton-Jacobi-Bellman equation and proves a new comparison result for the state constraint problem for a class of integro-differential variational inequalities.

...read moreread less

Abstract: We study a problem of optimal consumption and portfolio selection in a market where the logreturns of the uncertain assets are not necessarily normally distributed. The natural models then involve pure-jump Levy processes as driving noise instead of Brownian motion like in the Black and Scholes model. The state constrained optimization problem involves the notion of local substitution and is of singular type. The associated Hamilton-Jacobi-Bellman equation is a nonlinear first order integro-differential equation subject to gradient and state constraints. We characterize the value function of the singular stochastic control problem as the unique constrained viscosity solution of the associated Hamilton-Jacobi-Bellman equation. This characterization is obtained in two main steps. First, we prove that the value function is a constrained viscosity solution of an integro-differential variational inequality. Second, to ensure that the characterization of the value function is unique, we prove a new comparison (uniqueness) result for the state constraint problem for a class of integro-differential variational inequalities. In the case of HARA utility, it is possible to determine an explicit solution of our portfolio-consumption problem when the Levy process posseses only negative jumps. This is, however, the topic of a companion paper [7].

...read moreread less

124 citations

Journal Article•DOI•

Viability Kernels and Capture Basins of Sets Under Differential Inclusions

[...]

Jean-Pierre Aubin¹•Institutions (1)

Paris Dauphine University¹

01 Mar 2001-Siam Journal on Control and Optimization

TL;DR: This paper provides a characterization of viability kernels and capture basins of a target viable in a constrained subset as a unique closed subset between the target and the constrained subset satisfying tangential conditions or, by duality, normal conditions.

...read moreread less

Abstract: This paper provides a characterization of viability kernels and capture basins of a target viable in a constrained subset as a unique closed subset between the target and the constrained subset satisfying tangential conditions or, by duality, normal conditions. It is based on a method devised by Helene Frankowska for characterizing the value function of an optimal control problem as generalized (contingent or viscosity) solutions to Hamilton--Jacobi equations. These abstract results, interesting by themselves, can be applied to epigraphs of functions or graphs of maps and happen to be very efficient for solving other problems, such as stopping time problems, dynamical games, boundary-value problems for systems of partial differential equations, and impulse and hybrid control systems, which are the topics of other companion papers.

...read moreread less

106 citations

Proceedings Article•DOI•

Efficient on-line computation of constrained optimal control

[...]

L. Borrelli, T. Baotic, Alberto Bemporad, T. Morari

04 Dec 2001

TL;DR: Two algorithms are presented that efficiently perform the online evaluation of the explicit optimal control law both in terms of storage demands and computational complexity.

...read moreread less

Abstract: For discrete-time linear time-invariant systems with constraints on inputs and outputs, the constrained finite-time optimal controller can be obtained explicitly as a piecewise-affine function of the initial state via multi-parametric programming. By exploiting the properties of the value function, we present two algorithms that efficiently perform the online evaluation of the explicit optimal control law both in terms of storage demands and computational complexity. The algorithms are particularly effective when used for model-predictive control (MPC) where an open-loop constrained finite-time optimal control problem has to be solved at each sampling time.

...read moreread less

Journal Article•DOI•

Reward functionals, salvage values, and optimal stopping

[...]

Luis H. R. Alvarez¹•Institutions (1)

University of Turku¹

01 Dec 2001-Mathematical Methods of Operations Research

TL;DR: An explicit representation of the value function in terms of the minimal r-excessive mappings for the considered diffusion is derived and it is proved that the smooth pasting principle follows directly from the approach, while the contrary is not necessarily true.

...read moreread less

Abstract: We consider the optimal stopping of a linear diffusion in a problem subject to both a cumulative term measuring the expected cumulative present value of a continuous and potentially state-dependent profit flow and an instantaneous payoff measuring the salvage or terminal value received at the optimally chosen stopping date. We derive an explicit representation of the value function in terms of the minimal r-excessive mappings for the considered diffusion, and state a set of necessary conditions for optimal stopping by applying the classical theory of linear diffusions and ordinary non-linear programming techniques. We also state a set of conditions under which our necessary conditions are also sufficient and prove that the smooth pasting principle follows directly from our approach, while the contrary is not necessarily true.

...read moreread less

Proceedings Article•

Direct value-approximation for factored MDPs

[...]

Dale Schuurmans¹, Relu Patrascu¹•Institutions (1)

University of Waterloo¹

03 Jan 2001

TL;DR: This paper presents a simple approach for computing reasonable policies for factored Markov decision processes (MDPs), when the optimal value function can be approximated by a compact linear form.

...read moreread less

Abstract: We present a simple approach for computing reasonable policies for factored Markov decision processes (MDPs), when the optimal value function can be approximated by a compact linear form. Our method is based on solving a single linear program that approximates the best linear fit to the optimal value function. By applying an efficient constraint generation procedure we obtain an iterative solution method that tackles concise linear programs. This direct linear programming approach experimentally yields a significant reduction in computation time over approximate value- and policy-iteration methods (sometimes reducing several hours to a few seconds). However, the quality of the solutions produced by linear programming is weaker—usually about twice the approximation error for the same approximating class. Nevertheless, the speed advantage allows one to use larger approximation classes to achieve similar error in reasonable time.

...read moreread less

Journal Article•DOI•

Idempotent Interval Analysis and Optimization Problems

[...]

Grigori L. Litvinov, Andrei N. Sobolevskiī¹•Institutions (1)

Moscow State University¹

01 Oct 2001-Reliable Computing

TL;DR: In this article, interval analysis over idempotent semirings is applied to construction of exact interval solutions to the interval discrete stationary Bellman equation, which is typically NP-hard in the traditional interval linear algebra.

...read moreread less

Abstract: Many problems in optimization theory are strongly nonlinear in the traditional sense but possess a hidden linear structure over suitable idempotent semirings. After an overview of "Idempotent Mathematics" with an emphasis on matrix theory, interval analysis over idempotent semirings is developed. The theory is applied to construction of exact interval solutions to the interval discrete stationary Bellman equation. Solution of an interval system is typically NP-hard in the traditional interval linear algebra; in the idempotent case it is polynomial. A generalization to the case of positive semirings is outlined.

...read moreread less

Book Chapter•DOI•

Bellman function in stochastic control and harmonic analysis

[...]

Fedor Nazarov¹, Sergei Treil², Alexander Volberg¹•Institutions (2)

Michigan State University¹, Brown University²

01 Jan 2001

TL;DR: This work shows how the homonym function in harmonic analysis is (and how it is not) the same stochastic optimal control Bellman function, and presents several creatures from Bellman’s Zoo.

...read moreread less

Abstract: The stochastic optimal control uses the differential equation of Bell-man and its solution—the Bellman function. We show how the homonym function in harmonic analysis is (and how it is not) the same stochastic optimal control Bellman function. Then we present several creatures from Bellman’s Zoo: a function that proves the inverse Holder inequality, as well as several other harmonic analysis Bellman functions and their corresponding Bellman PDE’s. Finally we translate the approach of Burkholder to the language of “our” Bellman function.

...read moreread less

Proceedings Article•

Batch Value Function Approximation via Support Vectors

[...]

Thomas G. Dietterich¹, Xin Wang¹•Institutions (1)

Oregon State University¹

03 Jan 2001

TL;DR: Three ways of combining linear programming with the kernel trick to find value function approximations for reinforcement learning are presented, one based on SVM regression; the second is based on the Bellman equation; and the third seeks only to ensure that good moves have an advantage over bad moves.

...read moreread less

Abstract: We present three ways of combining linear programming with the kernel trick to find value function approximations for reinforcement learning. One formulation is based on SVM regression; the second is based on the Bellman equation; and the third seeks only to ensure that good moves have an advantage over bad moves. All formulations attempt to minimize the number of support vectors while fitting the data. Experiments in a difficult, synthetic maze problem show that all three formulations give excellent performance, but the advantage formulation is much easier to train. Unlike policy gradient methods, the kernel methods described here can easily adjust the complexity of the function approximator to fit the complexity of the value function.

...read moreread less

Solving Uncertain Markov Decision Processes

[...]

J. Andrew Bagnell, Andrew Y. Ng, Jeff Schneider

01 Jan 2001

TL;DR: The authors demonstrate that the uncertain model approach can be used to solve a class of nearly Markovian Decision Problems, providing lower bounds on performance in stochastic models with higher-order interactions.

...read moreread less

Abstract: The authors consider the fundamental problem of nding good policies in uncertain models. It is demonstrated that although the general problem of nding the best policy with respect to the worst model is NP-hard, in the special case of a convex uncertainty set the problem is tractable. A stochastic dynamic game is proposed, and the security equilibrium solution of the game is shown to correspond to the value function under the worst model and the optimal controller. The authors demonstrate that the uncertain model approach can be used to solve a class of nearly Markovian Decision Problems, providing lower bounds on performance in stochastic models with higher-order interactions. The framework considered establishes connections between and generalizes paradigms of stochastic optimal, mini-max, and H1/robust control. Applications are considered, including robustness in reinforcement learning, planning in nearly Markovian decision processes, and bounding error due to sensor discretization in noisy, continuous state-spaces.

...read moreread less

Journal Article•DOI•

Optimal portfolio management rules in a non-gaussian market with durability and intertemporal substitution

[...]

Fred Espen Benth¹, Kenneth H. Karlsen², Kristin Reikvam¹•Institutions (2)

Aarhus University¹, University of Bergen²

01 Oct 2001-Finance and Stochastics

TL;DR: The value function of the singular control problem is characterized as the unique constrained viscosity solution of the Hamilton-Jacobi-Bellman equation in the case of general utilities and general Lévy processes.

...read moreread less

Abstract: We consider an optimal portfolio-consumption problem which incorporates the notions of durability and intertemporal substitution. The logreturns of the uncertain assets are not necessarily normally distributed. The natural models then involve Levy processes as driving noise instead of the more frequently used Brownian motion. The optimization problem is a singular stochastic control problem and the associated Hamilton-Jacobi-Bellman equation is a nonlinear second order degenerate elliptic integro-differential equation subject to gradient and state constraints. For utility functions of HARA type, we calculate the optimal investment and consumption policies together with an explicit expression for the value function when the Levy process has only negative jumps. For the classical Merton problem, which is a special case of our optimization problem, we provide explicit policies for general Levy processes having both positive and negative jumps. Instead of following the classical approach of using a verification theorem, we validate our solution candidates within a viscosity solution framework. To this end, the value function of our singular control problem is characterized as the unique constrained viscosity solution of the Hamilton-Jacobi-Bellman equation in the case of general utilities and general Levy processes.

...read moreread less

Journal Article•DOI•

Backstepping design with local optimality matching

[...]

Zigang Pan¹, K. Ezal, Arthur J. Krener², Petar V. Kokotovic³•Institutions (3)

University of Cincinnati¹, University of California, Davis², University of California, Santa Barbara³

01 Jul 2001-IEEE Transactions on Automatic Control

TL;DR: The recursive construction of a cost functional and the corresponding solution to the Hamilton-Jacobi-Isaacs equation employs a new concept of nonlinear Cholesky factorization, and shows that the backstepping design procedure can be tuned to yield the optimal control law.

...read moreread less

Abstract: In nonlinear H/sup /spl infin//-optimal control design for strict-feedback nonlinear systems, our objective is to construct globally stabilizing control laws to match the optimal control law up to any desired order, and to be inverse optimal with respect to some computable cost functional. Our recursive construction of a cost functional and the corresponding solution to the Hamilton-Jacobi-Isaacs equation employs a new concept of nonlinear Cholesky factorization. When the value function for the system has a nonlinear Cholesky factorization, we show that the backstepping design procedure can be tuned to yield the optimal control law.

...read moreread less

Journal Article•DOI•

The problem of optimal robust sensor scheduling

[...]

Andrey V. Savkin¹, Robin J. Evans², Efstratios Skafidas•Institutions (2)

University of Western Australia¹, University of Melbourne²

15 Jun 2001-Systems & Control Letters

TL;DR: The problem of optimal robust sensor scheduling is formulated and solution to this problem is given in terms of the existence of suitable solutions to a Riccati differential equation of the game type and a dynamic programming equation.

...read moreread less

Proceedings Article•

Symmetry in Markov Decision Processes and its Implications for Single Agent and Multiagent Learning

[...]

Martin Zinkevich¹, Tucker R. Balch•Institutions (1)

Carnegie Mellon University¹

28 Jun 2001

TL;DR: It is proved that if an MDP possesses a symmetry, then the optimal value function andQ function are similarly symmetric and there exists a symmetric optimal policy.

...read moreread less

Abstract: This paper examines the notion of symmetry in Markov decision processes (MDPs). We define symmetry for an MDP and show how it can be exploited for more effective learning in single agent systems as well as multiagent systems and multirobot systems. We prove that if an MDP possesses a symmetry, then the optimal value function andQ function are similarly symmetric and there exists a symmetric optimal policy. If an MDP is known to possess a symmetry, this knowledge can be applied to decrease the number of training examples needed for algorithms like Q learning and value iteration. It can also be used to directly restrict the hypothesis space.

...read moreread less

Journal Article•DOI•

The finite element approximation of Hamilton-Jacobi-Bellman equations

[...]

M. Boulbrachene¹, M. Haiour²•Institutions (2)

Sultan Qaboos University¹, University of Annaba²

01 Apr 2001-Computers & Mathematics With Applications

TL;DR: In this article, a weakly coupled system of quasi-variational inequalities for finite element approximation of Hamilton-Jacobi-Bellman equations is presented, and a convergence and a quasi-optimal L∞-error estimate are established.

...read moreread less

Abstract: This paper deals with the finite element approximation of Hamilton-Jacobi-Bellman equations. We establish a convergence and a quasi-optimal L∞-error estimate, involving a weakly coupled systems of quasi-variational inequalities for the solution of which an interative scheme of monotone kind is introduced and analyzed.

...read moreread less

Journal Article•DOI•

Asymptotic optimality of tracking policies in stochastic networks

[...]

Nicole Bäuerle

01 Nov 2001-Annals of Applied Probability

TL;DR: In this paper, the authors formulate a general class of control problems in stochastic queuing networks and consider the corresponding fluid optimization problem, which is a deterministic control problem and often easy to solve.

...read moreread less

Abstract: Control problems in stochastic queuing networks are hard to solve. However, it is well known that the fluid model provides a useful approximation to the stochastic network.We will formulate a general class of control problems in stochastic queuing networks and consider the corresponding fluid optimization problem ($F$) which is a deterministic control problem and often easy to solve. Contrary to previous literature, our cost rate function is rather general.The value function of ($F$) provides an asymptotic lower bound on the value function of the stochastic network under fluid scaling. Moreover, we can construct from the optimal control of ($F$) a so-called tracking policy for the stochastic queuing network which achieves the lower bound as the fluid scaling parameter tends to $\infty$. In this case we say that the tracking policy is asymptotically optimal. This statement is true for multiclass queuing networks and admission and routing problems.The convergence is monotone under some convexity assumptions. The tracking policy approach also shows that a given fluid model solution can be attained as a fluid limit of the original discrete model.

...read moreread less

Journal Article•DOI•

Transmission rate scheduling for the non-real-time data in a cellular CDMA system

[...]

Riku Jantti¹, Seong-Lyun Kim²•Institutions (2)

Helsinki University of Technology¹, Information and Communications University²

01 May 2001-IEEE Communications Letters

TL;DR: Numerical results show that by having optimal scheduling, the problem of minimizing the time span to send all the packets can be significantly reduced, and thereby increase the radio network capacity.

...read moreread less

Abstract: The main focus of this letter is on the non-real-time service where the so-called best effort type of resource management is applicable. Given an amount of data packets to each user, the problem of minimizing the time span to send all the packets is investigated. In particular, the problem is elaborated in the CDMA contest. With Bellman's (1957) principle of optimality, we derive an optimal scheduling, in which users transmit data through a hybrid of CDMA and TDMA. Numerical results show that by having optimal scheduling, we can significantly reduce the transmission time span, and thereby increase the radio network capacity.

...read moreread less

Book•

Stable Parametric Programming

[...]

Sanjo Zlobec¹•Institutions (1)

McGill University¹

31 Aug 2001

TL;DR: In this article, the authors study only convex stable parametric programs (SPP) with a particular continuity (stability) requirement and derive a well-known marginal value formula for the optimal value function.

...read moreread less

Abstract: Stable parametric programs (abbreviation: SPP) are parametric programs with a particular continuity (stability) requirement. Optimal solutions of SPP are paths in the space of “parameters” (inputs, data) that preserve continuity of the feasible set point-to set mapping in the space of “decision variables”. The end points of these paths optimize the optimal value function on a region of stability In this paper we study only convex SPP. First we study optimality conditions. If the constraints enjoy the locally-flat-surface (“LFS”) property in the decision variable component, then the usual separation arguments apply and we can characterize local and global optimal solutions. Then we consider a well-known marginal value formula for the optimal value function. We prove the formula under new assumptions and then use it to modify a class of quasi-Newton methods in order to solve convex SPP. Finally, several solved case are reported

...read moreread less

Book Chapter•DOI•

Optimizing Average Reward Using Discounted Rewards

[...]

Sham M. Kakade

16 Jul 2001

TL;DR: A bound is provided on the average reward of the policy obtained by solving the Bellman equations which depends on the relationship between the discount factor and the mixingtime of the Markov chain.

...read moreread less

Abstract: In many reinforcement learningproblems, it is appropriate to optimize the average reward. In practice, this is often done by solving the Bellman equations usinga discount factor close to 1. In this paper, we provide a bound on the average reward of the policy obtained by solving the Bellman equations which depends on the relationship between the discount factor and the mixingtime of the Markov chain. We extend this result to the direct policy gradient of Baxter and Bartlett, in which a discount parameter is used to find a biased estimate of the gradient of the average reward with respect to the parameters of a policy. We show that this biased gradient is an exact gradient of a related discounted problem and provide a bound on the optima found by following these biased gradients of the average reward. Further, we show that the exact Hessian in this related discounted problem is an approximate Hessian of the average reward, with equality in the limit the discount factor tends to 1. We then provide an algorithm to estimate the Hessian from a sample path of the underlyingMark ov chain, which converges with probability 1.

...read moreread less

Journal Article•DOI•

A test of the principle of optimality

[...]

Enrica Carbone¹, John D. Hey¹•Institutions (1)

University of York¹

01 May 2001-Theory and Decision

TL;DR: In this paper, an experimental test of the Principle of Optimality in dynamic decision problems is presented, where the decision-maker should always choose the optimal decision at each stage of the decision problem, conditional on behaving optimally thereafter.

...read moreread less

Abstract: This paper reports on an experimental test of the Principle of Optimality in dynamic decision problems. This Principle, which states that the decision-maker should always choose the optimal decision at each stage of the decision problem, conditional on behaving optimally thereafter, underlies many theories of optimal dynamic decision making, but is normally difficult to test empirically without knowledge of the decision-maker's preference function. In the experiment reported here we use a new experimental procedure to get round this difficulty, which also enables us to shed some light on the decision process that the decision-maker is using if he or she is not using the Principle of Optimality - which appears to be the case in our experiments.

...read moreread less

Journal Article•DOI•

On Open- and Closed-Loop Bang-Bang Control in Nonzero-Sum Differential Games

[...]

Geert Jan Olsder

01 Apr 2001-Siam Journal on Control and Optimization

TL;DR: The Nash equilibria of two two- person non zero-sum differential games with hard constraints on the controls are studied and a modest exploration of singular surfaces in nonzero-sum games is aimed at.

...read moreread less

Abstract: The Nash equilibria of two two-person nonzero-sum differential games with hard constraints on the controls are studied. For both games the open-loop as well as the closed-loop solutions, and their relationships, are discussed. As is well-known for "smooth" nonzero-sum games, these solutions are generally different. Because of the constraints, the optimal controls are of the bang-bang type, and the solutions of the two problems under consideration are nonsmooth. One deals with non-Lipschitzian differential equations (considering the problem as an optimal control problem for one player while the bang-bang feedback control of the other player is assumed to be fixed), and the corresponding value functions possess singular surfaces. General conditions for the existence and uniqueness of the feedback solutions in this framework are not yet known. It is shown that in the two examples the open-loop and closed-loop solutions differ. As a by-product, the paper aims at a modest exploration of singular surfaces in nonzero-sum games.

...read moreread less

Journal Article•DOI•

First-Order Optimality Conditions for Degenerate Index Sets in Generalized Semi-Infinite Optimization

[...]

Oliver Stein

01 Aug 2001-Mathematics of Operations Research

TL;DR: For the special case when the so-called lower-level problem is convex, it is shown how the general optimality conditions can be strengthened, thereby giving a generalization of Theorem 4.2 in RA¼ckmann and Stein (2001).

...read moreread less

Abstract: We present a general framework for the derivation of first-order optimality conditions in generalized semi-infinite programming. Since in our approach no constraint qualifications are assumed for the index set, we can generalize necessary conditions given by RA¼ckmann and Shapiro (1999) as well as the characterizations of local minimizers of order one, which were derived by Stein and Still (2000). Moreover, we obtain a short proof for Theorem 1.1 in Jongen et al. (1998).For the special case when the so-called lower-level problem is convex, we show how the general optimality conditions can be strengthened, thereby giving a generalization of Theorem 4.2 in RA¼ckmann and Stein (2001). Finally, if the directional derivative of a certain optimal value function exists and is subadditive with respect to the direction, we propose a Mangasarian-Fromovitz-type constraint qualification and show that it implies an Abadie-type constraint qualification.

...read moreread less

Journal Article•DOI•

Second-Order Optimality Conditions in Generalized Semi-Infinite Programming

[...]

Jan-J. Rückmann¹, Alexander Shapiro²•Institutions (2)

Technische Universität Ilmenau¹, Georgia Institute of Technology²

01 Mar 2001-Set-valued Analysis

TL;DR: In this article, sufficient and necessary second-order optimality conditions for generalized semi-infinite optimization problems are derived under assumptions that the corresponding optimal value function is secondorder (parabolically) directionally differentiable and secondorder epiregular at the considered point.

...read moreread less

Abstract: This paper deals with generalized semi-infinite optimization problems where the (infinite) index set of inequality constraints depends on the state variables and all involved functions are twice continuously differentiable. Necessary and sufficient second-order optimality conditions for such problems are derived under assumptions which imply that the corresponding optimal value function is second-order (parabolically) directionally differentiable and second-order epiregular at the considered point. These sufficient conditions are, in particular, equivalent to the second-order growth condition.

...read moreread less