Showing papers on "Bellman equation published in 2002"

PDF

Open Access

Book•

[...]

Vijay R. Konda¹, John N. Tsitsiklis•Institutions (1)

01 Jan 2002

TL;DR: This thesis proposes and studies actor-critic algorithms which combine the above two approaches with simulation to find the best policy among a parameterized class of policies, and proves convergence of the algorithms for problems with general state and decision spaces.

...read moreread less

Abstract: Many complex decision making problems like scheduling in manufacturing systems, portfolio management in finance, admission control in communication networks etc., with clear and precise objectives, can be formulated as stochastic dynamic programming problems in which the objective of decision making is to maximize a single “overall” reward. In these formulations, finding an optimal decision policy involves computing a certain “value function” which assigns to each state the optimal reward one would obtain if the system was started from that state. This function then naturally prescribes the optimal policy, which is to take decisions that drive the system to states with maximum value. For many practical problems, the computation of the exact value function is intractable, analytically and numerically, due to the enormous size of the state space. Therefore one has to resort to one of the following approximation methods to find a good sub-optimal policy: (1) Approximate the value function. (2) Restrict the search for a good policy to a smaller family of policies. In this thesis, we propose and study actor-critic algorithms which combine the above two approaches with simulation to find the best policy among a parameterized class of policies. Actor-critic algorithms have two learning units: an actor and a critic. An actor is a decision maker with a tunable parameter. A critic is a function approximator. The critic tries to approximate the value function of the policy used by the actor, and the actor in turn tries to improve its policy based on the current approximation provided by the critic. Furthermore, the critic evolves on a faster time-scale than the actor. We propose several variants of actor-critic algorithms. In all the variants, the critic uses Temporal Difference (TD) learning with linear function approximation. Some of the variants are inspired by a new geometric interpretation of the formula for the gradient of the overall reward with respect to the actor parameters. This interpretation suggests a natural set of basis functions for the critic, determined by the family of policies parameterized by the actor's parameters. We concentrate on the average expected reward criterion but we also show how the algorithms can be modified for other objective criteria. We prove convergence of the algorithms for problems with general (finite, countable, or continuous) state and decision spaces. To compute the rate of convergence (ROC) of our algorithms, we develop a general theory of the ROC of two-time-scale algorithms and we apply it to study our algorithms. In the process, we study the ROC of TD learning and compare it with related methods such as Least Squares TD (LSTD). We study the effect of the basis functions used for linear function approximation on the ROC of TD. We also show that the ROC of actor-critic algorithms does not depend on the actual basis functions used in the critic but depends only on the subspace spanned by them and study this dependence. Finally, we compare the performance of our algorithms with other algorithms that optimize over a parameterized family of policies. We show that when only the “natural” basis functions are used for the critic, the rate of convergence of the actor critic algorithms is the same as that of certain stochastic gradient descent algorithms. However, with appropriate additional basis functions for the critic, we show that our algorithms outperform the existing ones in terms of ROC. (Copies available exclusively from MIT Libraries, Rm. 14-0551, Cambridge, MA 02139-4307. Ph. 617-253-5668; Fax 617-253-1690.)

...read moreread less

1,766 citations

Journal Article•DOI•

Envelope Theorems for Arbitrary Choice Sets

[...]

Paul Milgrom¹, Ilya Segal¹•Institutions (1)

Stanford University¹

01 Mar 2002-Econometrica

TL;DR: The standard envelope theorems apply to choice sets with convex and topological structure, providing sufficient conditions for the value function to be differentiable in a parameter and characterizing its derivative as mentioned in this paper.

...read moreread less

Abstract: The standard envelope theorems apply to choice sets with convex and topological structure, providing sufficient conditions for the value function to be differentiable in a parameter and characterizing its derivative. This paper studies optimization with arbitrary choice sets and shows that the traditional envelope formula holds at any differentiability point of the value function. We also provide conditions for the value function to be, variously, absolutely continuous, left- and right-differentiable, or fully differentiable. These results are applied to mechanism design, convex programming, continuous optimization problems, saddle-point problems, problems with parameterized constraints, and optimal stopping problems.

...read moreread less

1,183 citations

Journal Article•DOI•

Identifying Dynamic Discrete Decision Processes

[...]

Thierry Magnac¹, David Thesmar¹•Institutions (1)

INSEE¹

01 Mar 2002-Econometrica

TL;DR: In this paper, the authors show that dynamic discrete choice models can not be identified as long as the following structural parameters are not set: the distribution function of unobserved preference shocks, the discount rate and the current and future preferences in one (reference) alternative.

...read moreread less

Abstract: In this paper, we analyze the nonparametric identification of dynamic discrete choice models. Our methodology is based on the insight of Hotz and Miller (1993) that Bellman equations can be interpreted as moment conditions. We consider cases with and without unobserved heterogeneity. Not only do we show that these models are not identified (Rust (1994)), we are also able to determine their exact degree of underidentification. We begin with the case without correlated unobserved heterogeneity. Using Bellman equations as moment conditions, we show that utility functions in each alternative cannot be (nonparametrically) identified as long as the following structural parameters are not set: the distribution function of unobserved preference shocks, the discount rate, and the current and future preferences in one (reference) alternative. We also investigate how exclusion or parametric restrictions can provide identifying restrictions. As the identification proof is constructive, a simple method of moment estimator can be derived and overidentifying restrictions can be tested. Provided that one is willing to make stronger identifying assumptions, dynamic discrete choice modelling is thus little different from the continuous case. Bellman equations can be used to recover deep structural parameters as are Euler equations. We continue by exploring a case where the unobserved component of preferences is correlated over time. Even if the functional degree of underidentification of this model is larger, we present reasonable identifying assumptions that lead to the same identification results as without unobserved heterogeneity. The same methodology using moment conditions is applied. This paper expands upon the work in Rust (1994), where the generic nonidentification result is stated. We use a slightly different model. In our case, agents' preferences have unobservable and possibly persistent components. The constructive aspect of our proof allows us to interpret Rust's underidentification result and to propose identifying restrictions. On the technical side, the insights for our identification strategy are borrowed from the works of Hotz and Miller (1993), Hotz et al. (1994), and Altug and Miller (1998). We

...read moreread less

446 citations

Journal Article•DOI•

Variable Resolution Discretization in Optimal Control

[...]

Rémi Munos¹, Andrew W. Moore²•Institutions (2)

École Polytechnique¹, Carnegie Mellon University²

01 Nov 2002-Machine Learning

TL;DR: This paper evaluates the performance of a variety of splitting criteria on many benchmark problems, paying careful attention to their number-of-cells versus closeness-to-optimality tradeoff curves.

...read moreread less

Abstract: The problem of state abstraction is of central importance in optimal control, reinforcement learning and Markov decision processes. This paper studies the case of variable resolution state abstraction for continuous time and space, deterministic dynamic control problems in which near-optimal policies are required. We begin by defining a class of variable resolution policy and value function representations based on Kuhn triangulations embedded in a kd-trie. We then consider top-down approaches to choosing which cells to split in order to generate improved policies. The core of this paper is the introduction and evaluation of a wide variety of possible splitting criteria. We begin with local approaches based on value function and policy properties that use only features of individual cells in making split choices. Later, by introducing two new non-local measures, influence and variance, we derive splitting criteria that allow one cell to efficiently take into account its impact on other cells when deciding whether to split. Influence is an efficiently-calculable measure of the extent to which changes in some state effect the value function of some other states. Variance is an efficiently-calculable measure of how risky is some state in a Markov chain: a low variance state is one in which we would be very surprised if, during any one execution, the long-term reward attained from that state differed substantially from its expected value, given by the value function. The paper proceeds by graphically demonstrating the various approaches to splitting on the familiar, non-linear, non-minimum phase, and two dimensional problem of the “Car on the hill”. It then evaluates the performance of a variety of splitting criteria on many benchmark problems, paying careful attention to their number-of-cells versus closeness-to-optimality tradeoff curves.

...read moreread less

360 citations

Journal Article•DOI•

On minimizing the ruin probability by investment and reinsurance

[...]

Hanspeter Schmidli

01 Aug 2002-Annals of Applied Probability

TL;DR: In this article, the authors consider a classical risk model and allow investment into a risky asset modelled as a Black-Scholes model as well as (proportional) reinsurance.

...read moreread less

Abstract: We consider a classical risk model and allow investment into a risky asset modelled as a Black--Scholes model as well as (proportional) reinsurance. Via the Hamilton--Jacobi--Bellman approach we find a candidate for the optimal strategy and develop a numerical procedure to solve the HJB equation. We prove a verification theorem in order to show that any increasing solution to the HJB equation is bounded and solves the optimisation problem. We prove that an increasing solution to the HJB equation exists. Finally two numerical examples are discussed.

...read moreread less

321 citations

Journal Article•DOI•

On the convergence rate of approximation schemes for Hamilton-Jacobi-Bellman equations

[...]

Guy Barles¹, Espen R. Jakobsen²•Institutions (2)

François Rabelais University¹, Norwegian University of Science and Technology²

01 Jan 2002-Mathematical Modelling and Numerical Analysis

TL;DR: General results on the rate of convergence of a certain class of monotone approximation schemes for stationary Hamilton-Jacobi- Bellman equations with variable coecients are obtained using systematically a tricky idea of N.V. Krylov.

...read moreread less

Abstract: Using systematically a tricky idea of N.V. Krylov, we obtain general results on the rate of convergence of a certain class of monotone approximation schemes for stationary Hamilton-Jacobi-Bellman equations with variable coefficients. This result applies in particular to control schemes based on the dynamic programming principle and to finite difference schemes despite, here, we are not able to treat the most general case. General results have been obtained earlier by Krylov for finite difference schemes in the stationary case with constant coefficients and in the time-dependent case with variable coefficients by using control theory and probabilistic methods. In this paper we are able to handle variable coefficients by a purely analytical method. In our opinion this way is far simpler and, for the cases we can treat, it yields a better rate of convergence than Krylov obtains in the variable coefficients case.

...read moreread less

197 citations

Proceedings Article•DOI•

Optimal control of constrained, piecewise affine systems with bounded disturbances

[...]

Eric C. Kerrigan¹, David Q. Mayne•Institutions (1)

University of Cambridge¹

10 Dec 2002

TL;DR: In this article, a solution to the problem of optimal control of piecewise affine systems with a bounded disturbance is characterised results that allow one to compute the value function, its domain (robustly controllable set) and the optimal control law are presented.

...read moreread less

Abstract: The solution to the problem of optimal control of piecewise affine systems with a bounded disturbance is characterised Results that allow one to compute the value function, its domain (robustly controllable set) and the optimal control law are presented The tools that are employed include dynamic programming, polytopic set algebra and parametric programming When the cost is time (robust time-optimal control problem) or the stage cost is piecewise affine (robust optimal and robust receding horizon control problems), the value function and the optimal control law are both piecewise affine and each robustly controllable set is the union of a finite set of polytopes Conditions on the cost and constraints are also proposed in order to ensure that the optimal control laws are robustly stabilising

...read moreread less

150 citations

Journal Article•DOI•

Smooth Solutions to Optimal Investment Models with Stochastic Volatilities and Portfolio Constraints

[...]

Pham¹•Institutions (1)

University of Paris¹

01 Oct 2002-Applied Mathematics and Optimization

TL;DR: In this paper, an extension of Merton's optimal investment problem to a multidimensional model with stochastic volatility and portfolio constraints is presented, and an optimal portfolio is shown to exist, and is expressed in terms of the classical solution to this semilinear equation.

...read moreread less

Abstract: . This paper deals with an extension of Merton's optimal investment problem to a multidimensional model with stochastic volatility and portfolio constraints. The classical dynamic programming approach leads to a characterization of the value function as a viscosity solution of the highly nonlinear associated Bellman equation. A logarithmic transformation expresses the value function in terms of the solution to a semilinear parabolic equation with quadratic growth on the derivative term. Using a stochastic control representation and some approximations, we prove the existence of a smooth solution to this semilinear equation. An optimal portfolio is shown to exist, and is expressed in terms of the classical solution to this semilinear equation. This reduction is useful for studying numerical schemes for both the value function and the optimal portfolio. We illustrate our results with several examples of stochastic volatility models popular in the financial literature.

...read moreread less

144 citations

Journal Article•DOI•

Structural Properties of Stochastic Dynamic Programs

[...]

James E. Smith¹, Kevin F. McCardle²•Institutions (2)

Duke University¹, University of California, Los Angeles²

01 Sep 2002-Operations Research

TL;DR: This paper presents several fundamental results for establishing closed convex cone properties, a large class of properties that includes monotonicity, convexity, and supermodularity, as well as combinations of these and many other properties of interest.

...read moreread less

Abstract: In Markov models of sequential decision processes, one is often interested in showing that the value function is monotonic, convex, and/or supermodular in the state variables. These kinds of results can be used to develop a qualitative understanding of the model and characterize how the results will change with changes in model parameters. In this paper we present several fundamental results for establishing these kinds of properties. The results are, in essence, "metatheorems" showing that the value functions satisfy property P if the reward functions satisfy property P and the transition probabilities satisfy a stochastic version of this property. We focus our attention on closed convex cone properties, a large class of properties that includes monotonicity, convexity, and supermodularity, as well as combinations of these and many other properties of interest.

...read moreread less

119 citations

Proceedings Article•

A Convergent Form of Approximate Policy Iteration

[...]

Theodore J. Perkins¹, Doina Precup²•Institutions (2)

University of Massachusetts Amherst¹, McGill University²

01 Jan 2002

TL;DR: It is proved that if the policy improvement operator produces e-soft policies and is Lipschitz continuous in the action values, with a constant that is not too large, then the approximate policy iteration algorithm converges to a unique solution from any initial policy.

...read moreread less

Abstract: We study a new, model-free form of approximate policy iteration which uses Sarsa updates with linear state-action value function approximation for policy evaluation, and a "policy improvement operator" to generate a new policy based on the learned state-action values. We prove that if the policy improvement operator produces e-soft policies and is Lipschitz continuous in the action values, with a constant that is not too large, then the approximate policy iteration algorithm converges to a unique solution from any initial policy. To our knowledge, this is the first convergence result for any form of approximate policy iteration under similar computational-resource assumptions.

...read moreread less

105 citations

Journal Article•DOI•

Risk-sensitive portfolio optimization on infinite time horizon

[...]

Kazutaka Kuroda, Hideo Nagai¹•Institutions (1)

Osaka University¹

01 Jan 2002-Stochastics and Stochastics Reports

TL;DR: In this article, the authors considered a continuous time portfolio optimization problem on an infinite time horizon for a factor model, where the mean returns of individual securities or asset categories are explicitly affected by economic factors.

...read moreread less

Abstract: We consider a continuous time portfolio optimization problems on an infinite time horizon for a factor model, recently treated by Bielecki and Pliska ["Risk-sensitive dynamic asset management", Appl. Math. Optim. , 39 (1990) 337-360], where the mean returns of individual securities or asset categories are explicitly affected by economic factors. The factors are assumed to be Gaussian processes. We see new features in constructing optimal strategies for risk-sensitive criteria of the portfolio optimization on an infinite time horizon, which are obtained from the solutions of matrix Riccati equations.

...read moreread less

Journal Article•DOI•

Dynamic programming in stochastic control of systems with delay

[...]

Bjørnar Larssen¹•Institutions (1)

Oslo and Akershus University College of Applied Sciences¹

01 Jan 2002-Stochastics and Stochastics Reports

TL;DR: In this article, the authors consider optimal control problems for systems described by stochastic differential equations with delay (SDDE) and prove a version of the dynamic programming principle for a general class of such problems.

...read moreread less

Abstract: We consider optimal control problems for systems described by stochastic differential equations with delay (SDDE). We prove a version of Bellman's principle of optimality (the dynamic programming principle) for a general class of such problems. That the class in general means that both the dynamics and the cost depends on the past in a general way. As an application, we study systems where the value function depends on the past only through some weighted average. For such systems we obtain a Hamilton-Jacobi-Bellman partial differential equation that the value function must solve if it is smooth enough. The weak uniqueness of the SDDEs we consider is our main tool in proving the result. Notions of strong and weak uniqueness for SDDEs are introduced, and we prove that strong uniqueness implies weak uniqueness, just as for ordinary stochastic differential equations.

...read moreread less

Book Chapter•DOI•

On the Optimal Control Law for Linear Discrete Time Hybrid Systems

[...]

Alberto Bemporad¹, Francesco Borrelli², Manfred Morari²•Institutions (2)

University of Siena¹, ETH Zurich²

25 Mar 2002

TL;DR: It is proved that the closed form of the state-feedback solution to finite time optimal control based on quadratic or linear norms performance criteria is a time-varying piecewise affine feedback control law.

...read moreread less

Abstract: In this paper we study the solution to optimal control problems for discrete time linear hybrid systems. First, we prove that the closed form of the state-feedback solution to finite time optimal control based on quadratic or linear norms performance criteria is a time-varying piecewise affine feedback control law. Then, we give an insight into the structure of the optimal state-feedback solution and of the value function. Finally, we briefly describe how the optimal control law can be computed by means of multiparametric programming.

...read moreread less

Journal Article•DOI•

Optimal Strategies for Risk-Sensitive Portfolio Optimization Problems for General Factor Models

[...]

Hideo Nagai¹•Institutions (1)

Osaka University¹

01 Jun 2002-Siam Journal on Control and Optimization

TL;DR: It is shown that the optimal diffusion processes of the problem are ergodic and that under some condition related to integrability by the invariant measures of the diffusion processes the authors can construct optimal strategies for the original problems by using the solution of the Bellman equations.

...read moreread less

Abstract: We consider constructing optimal strategies for risk-sensitive portfolio optimization problems on an infinite time horizon for general factor models, where the mean returns and the volatilities of individual securities or asset categories are explicitly affected by economic factors. The factors are assumed to be general diffusion processes. In studying the ergodic type Bellman equations of the risk-sensitive portfolio optimization problems, we introduce some auxiliary classical stochastic control problems with the same Bellman equations as the original ones. We show that the optimal diffusion processes of the problem are ergodic and that under some condition related to integrability by the invariant measures of the diffusion processes we can construct optimal strategies for the original problems by using the solution of the Bellman equations.

...read moreread less

Journal Article•DOI•

An Isomorphism Between Asset Pricing Models With and Without Linear Habit Formation

[...]

Mark Schroder¹, Costis Skiadas²•Institutions (2)

Michigan State University¹, Northwestern University²

01 Jul 2002-Review of Financial Studies

TL;DR: In this article, the authors show an isomorphism between optimal portfolio selection or competitive equilibrium models with utilities incorporating linear habit formation and corresponding models without habit formation, which can be used to mechanically transform known solutions not involving habit formation to corresponding solutions with habit formation.

...read moreread less

Abstract: We show an isomorphism between optimal portfolio selection or competitive equilibrium models with utilities incorporating linear habit formation, and corresponding models without habit formation. The isomorphism can be used to mechanically transform known solutions not involving habit formation to corresponding solutions with habit formation. For example, the Constantinides (1990) and Ingersoll (1992) solutions are mechanically obtained from the familiar Merton solutions for the additive utility case, without recourse to a Bellman equation or first-order conditions. More generally, recent solutions to portfolio selection problems with recursive utility and a stochastic investment opportunity set are readily transformed to novel solutions of corresponding problems with utility that combines recursivity with habit formation. The methodology also applies in the context of Hindy‐Huang‐Kreps (1992) preferences, where our isomorphism shows that the solution obtained by Hindy and Huang (1993) can be mechanically transformed to Dybvig’s (1995) solution to the optimal consumption-investment problem with consumption ratcheting. This article presents a general method for solving asset pricing or portfolio selection models involving linear habit formation of the type studied by Sundaresan (1989), Constantinides (1990), Detemple and Zapatero (1991), Ingersoll (1992), Chapman (1998), and others. The basic idea we pursue is that linear habit formation can be thought of as a redefinition of what constitutes consumption, to include not only the current consumption rate but also a fictitious (possibly negative) consumption rate derived from past actual consumption. By pricing out this fictitious consumption rate correctly, the economy with linear habit formation can be mechanically transformed to an equivalent economy without habit formation. This analysis simplifies and unifies existing results, but also generates novel solutions. For example, together with the results of Schroder and Skiadas (1997, 1999), our method produces optimal lifetime consumption and portfolio policies with preferences that combine recursive utility with habit formation under a stochastic

...read moreread less

Posted Content•

Identifying Dynamic Discrete Decision Processes

[...]

Thierry Magnac¹, David Thesmar², David Thesmar³, David Thesmar⁴•Institutions (4)

INSEE¹, Massachusetts Institute of Technology², National Bureau of Economic Research³, Economic Policy Institute⁴

04 Oct 2002-Social Science Research Network

TL;DR: The exact degree of underidentification of these models is derived both in the case where random shocks on preferences are independent over time and in a case with correlated fixed effects.

...read moreread less

Abstract: In this paper, we analyse the non parametric identification of dynamic discrete choice models using short-panel data. Our identification methodology is based on the ideas explored in the seminal paper of Hotz and Miller (1993) that Bellman equations can be interpreted as moment conditions. We derive the exact degree of underidentification of these models both in the case where random shocks on preferences are independent over time and in a case with correlated fixed effects. We investigate the necessity and power of various identifying restrictions.

...read moreread less

Journal Article•DOI•

Optimal stopping with random intervention times

[...]

Paul Dupuis¹, Hui Wang¹•Institutions (1)

Brown University¹

01 Mar 2002-Advances in Applied Probability

TL;DR: In this article, the authors consider a class of optimal stopping problems where the ability to stop depends on an exogenous Poisson signal process -we can only stop at the Poisson jump times.

...read moreread less

Abstract: We consider a class of optimal stopping problems where the ability to stop depends on an exogenous Poisson signal process - we can only stop at the Poisson jump times. Even though the time variable in these problems has a discrete aspect, a variational inequality can be obtained by considering an underlying continuous-time structure. Depending on whether stopping is allowed at t = 0, the value function exhibits different properties across the optimal exercise boundary. Indeed, the value function is only 𝒞 0 across the optimal boundary when stopping is allowed at t = 0 and 𝒞 2 otherwise, both contradicting the usual 𝒞 1 smoothness that is necessary and sufficient for the application of the principle of smooth fit. Also discussed is an equivalent stochastic control formulation for these stopping problems. Finally, we derive the asymptotic behaviour of the value functions and optimal exercise boundaries as the intensity of the Poisson process goes to infinity or, roughly speaking, as the problems converge to the classical continuous-time optimal stopping problems.

...read moreread less

Journal Article•DOI•

Optimal Growth Models with Bounded or Unbounded Returns: A Unifying Approach

[...]

Cuong Le Van¹, Lisa Morhaim¹•Institutions (1)

University of Paris¹

01 Jul 2002-Journal of Economic Theory

TL;DR: In this article, the authors propose a unified approach to study optimal growth models with bounded or unbounded returns (above/below), and they prove existence of optimal solutions, without using contraction method, that the value function is the unique solution to the Bellman equation.

...read moreread less

Journal Article•DOI•

Kernel-based reinforcement learning in average-cost problems

[...]

Dirk Ormoneit, Peter W. Glynn

10 Dec 2002-IEEE Transactions on Automatic Control

TL;DR: This work presents a new, kernel-based approach to reinforcement learning which overcomes this difficulty and provably converges to a unique solution and can be shown to be consistent in the sense that its costs converge to the optimal costs asymptotically.

...read moreread less

Abstract: Reinforcement learning (RL) is concerned with the identification of optimal controls in Markov decision processes (MDPs) where no explicit model of the transition probabilities is available. We propose a class of RL algorithms which always produces stable estimates of the value function. In detail, we use "local averaging" methods to construct an approximate dynamic programming (ADP) algorithm. Nearest-neighbor regression, grid-based approximations, and trees can all be used as the basis of this approximation. We provide a thorough theoretical analysis of this approach and we demonstrate that ADP converges to a unique approximation in continuous-state average-cost MDPs. In addition, we prove that our method is consistent in the sense that an optimal approximate strategy is identified asymptotically. With regard to a practical implementation, we suggest a reduction of ADP to standard dynamic programming in an artificial finite-state MDP.

...read moreread less

Posted Content•

Envelope Theorems for Arbitrary Choice Sets

[...]

Paul Milgrom¹, Ilya Segal¹•Institutions (1)

Stanford University¹

17 Aug 2002-Social Science Research Network

...read moreread less

Proceedings Article•

Optimality of Reinforcement Learning Algorithms with Linear Function Approximation

[...]

Ralf Schoknecht¹•Institutions (1)

Karlsruhe Institute of Technology¹

01 Jan 2002

TL;DR: This paper shows that each of the solutions is optimal with respect to a specific objective function and characterises the different solutions as images of the optimal exact value function under different projection operations.

...read moreread less

Abstract: There are several reinforcement learning algorithms that yield approximate solutions for the problem of policy evaluation when the value function is represented with a linear function approximator. In this paper we show that each of the solutions is optimal with respect to a specific objective function. Moreover, we characterise the different solutions as images of the optimal exact value function under different projection operations. The results presented here will be useful for comparing the algorithms in terms of the error they achieve relative to the error of the optimal approximate solution.

...read moreread less

Journal Article•DOI•

Partial hedging in a stochastic volatility environment

[...]

Mattias Jonsson¹, Ronnie Sircar•Institutions (1)

University of Michigan¹

01 Oct 2002-Mathematical Finance

TL;DR: In this paper, the authors consider the problem of partial hedging of derivative risk in a stochastic volatility environment and derive approximate value functions and strategies that are easy to implement and study.

...read moreread less

Abstract: We consider the problem of partial hedging of derivative risk in a stochastic volatility environment. It is related to state-dependent utility maximization problems in classical economics. We derive the dual problem from the Legendre transform of the associated Bellman equation and interpret the optimal strategy as the perfect hedging strategy for a modified claim. Under the assumption that volatility is fast mean-reverting and using a singular perturbation analysis, we derive approximate value functions and strategies that are easy to implement and study. The analysis identifies the usual mean historical volatility and the harmonically averaged long-run volatility as important statistics for such optimization problems without further specification of a stochastic volatility model. The approximation can be improved by specifying a model and can be calibrated for the leverage effect from the implied volatility skew. We study the effectiveness of these strategies using simulated stock paths.

...read moreread less

Proceedings Article•DOI•

Greedy linear value-approximation for factored Markov decision processes

[...]

Relu Patrascu¹, Pascal Poupart², Dale Schuurmans¹, Craig Boutilier², Carlos Guestrin³ - Show less +1 more•Institutions (3)

University of Waterloo¹, University of Toronto², Stanford University³

28 Jul 2002

TL;DR: This work provides a branch and bound method for calculating Bellman error and performing approximate policy iteration for general factored MDPs and considers linear programming itself and investigate methods for automatically constructing sets of basis functions that allow this approach to produce good approximations.

...read moreread less

Abstract: Significant recent work has focused on using linear representations to approximate value functions for factored Markov decision processes (MDPs). Current research has adopted linear programming as an effective means to calculate approximations for a given set of basis functions, tackling very large MDPs as a result. However, a number of issues remain unresolved: How accurate are the approximations produced by linear programs? How hard is it to produce better approximations? and Where do the basis functions come from? To address these questions, we first investigate the complexity of minimizing the Bellman error of a linear value function approximation--showing that this is an inherently hard problem. Nevertheless, we provide a branch and bound method for calculating Bellman error and performing approximate policy iteration for general factored MDPs. These methods are more accurate than linear programming, but more expensive. We then consider linear programming itself and investigate methods for automatically constructing sets of basis functions that allow this approach to produce good approximations. The techniques we develop are guaranteed to reduce L1 error, but can also empirically reduce Bellman error.

...read moreread less

Journal Article•DOI•

A Dynamic Programming Procedure for Pricing American-Style Asian Options

[...]

Hatem Ben-Ameur¹, Michèle Breton¹, Pierre L'Ecuyer²•Institutions (2)

École Normale Supérieure¹, Université de Montréal²

01 May 2002-Management Science

TL;DR: A procedure for pricing American-style Asian options of the Bermudan flavor, based on dynamic programming combined with finite-element piecewise-polynomial approximation of the value function, is developed here.

...read moreread less

Abstract: Pricing European-style Asian options based on the arithmetic average, under the Black and Scholes model, involves estimating an integral (a mathematical expectation) for which no easily computable analytical solution is available. Pricing their American-style counterparts, which provide early exercise opportunities, poses the additional difficulty of solving a dynamic optimization problem to determine the optimal exercise strategy. A procedure for pricing American-style Asian options of the Bermudan flavor, based on dynamic programming combined with finite-element piecewise-polynomial approximation of the value function, is developed here. A convergence proof is provided. Numerical experiments illustrate the consistency and efficiency of the procedure. Theoretical properties of the value function and of the optimal exercise strategy are also established.

...read moreread less

Proceedings Article•DOI•

Piecewise linear value function approximation for factored MDPs

[...]

Pascal Poupart¹, Craig Boutilier¹, Relu Patrascu², Dale Schuurmans²•Institutions (2)

University of Toronto¹, University of Waterloo²

28 Jul 2002

TL;DR: It is argued that this architecture for constructing a piecewise linear combination of the subtask value functions, using greedy decision tree techniques, is suitable for many types of MDPs whose combinatorics are determined largely by the existence multiple conflicting objectives.

...read moreread less

Abstract: A number of proposals have been put forth in recent years for the solution of Markov decision processes (MDPs) whose state (and sometimes action) spaces are factored. One recent class of methods involves linear value function approximation, where the optimal value function is assumed to be a linear combination of some set of basis functions, with the aim of finding suitable weights. While sophisticated techniques have been developed for finding the best approximation within this constrained space, few methods have been proposed for choosing a suitable basis set, or modifying it if solution quality is found wanting. We propose a general framework, and specific proposals, that address both of these questions. In particular, we examine weakly coupled MDPs where a number of subtasks can be viewed independently modulo resource constraints. We then describe methods for constructing a piecewise linear combination of the subtask value functions, using greedy decision tree techniques. We argue that this architecture is suitable for many types of MDPs whose combinatorics are determined largely by the existence multiple conflicting objectives.

...read moreread less

Journal Article•DOI•

Is There a Curse of Dimensionality for Contraction Fixed Points in the Worst Case

[...]

John Rust¹, Joseph F. Traub², Henryk Wozniakowski³•Institutions (3)

University of Maryland, College Park¹, Columbia University², University of Warsaw³

01 Jan 2002-Econometrica

TL;DR: In this paper, the authors studied the complexity of the contraction fixed point problem and showed that in the worst case the minimal number of function evaluations and arithmetic operations required to compute an e-approximation to a fixed point V * e B d increases exponentially in d. They showed that the curse of dimensionality disappears if the domain of Γ has additional special structure.

...read moreread less

Abstract: This paper analyzes the complexity of the contraction fixed point problem: compute an e-approximation to the fixed point V * = Γ(V * ) of a contraction mapping r that maps a Banach space B d of continuous functions of d variables into itself. We focus on quasi linear contractions where Γ is a nonlinear functional of a finite number of conditional expectation operators. This class includes contractive Fredholm integral equations that arise in asset pricing applications and the contractive Bellman equation from dynamic programming. In the absence of further restrictions on the domain of Γ, the quasi linear fixed point problem is subject to the curse of dimensionality, i.e., in the worst case the minimal number of function evaluations and arithmetic operations required to compute an e-approximation to a fixed point V * e B d increases exponentially in d. We show that the curse of dimensionality disappears if the domain of Γ has additional special structure. We identify a particular type of special structure for which the problem is strongly tractable even in the worst case, i.e., the number of function evaluations and arithmetic operations needed to compute an e-approximation of V * is bounded by Ce -p where C and p are constants independent of d. We present examples of economic problems that have this type of special structure including a class of rational expectations asset pricing problems for which the optimal exponent p = 1 is nearly achieved.

...read moreread less

Journal Article•DOI•

On constrained infinite-time nonlinear optimal control

[...]

Vasilios I. Manousiouthakis¹, Donald J. Chmielewski¹•Institutions (1)

University of California, Los Angeles¹

01 Jan 2002-Chemical Engineering Science

TL;DR: In this paper, the authors consider the infinite-time optimal control of input affine nonlinear systems subject to point-wise in time inequality constraints on both the process inputs and outputs.

...read moreread less

Journal Article•DOI•

Nonlinear stochastic optimal control of partially observable linear structures

[...]

Weiqiu Zhu¹, Zuguang Ying¹•Institutions (1)

Zhejiang University¹

01 Mar 2002-Engineering Structures

TL;DR: In this paper, a nonlinear stochastic optimal control of partially observable linear structures is proposed and illustrated with linear building structures equipped with control devices and sensors under horizontal ground acceleration excitation.

...read moreread less

Journal Article•DOI•

Optimal containment control for a class of stochastic systems perturbed by poisson and wiener processes

[...]

Ilya Kolmanovsky¹, T.L. Maizenberg²•Institutions (2)

Ford Motor Company¹, National University of Science and Technology²

01 Dec 2002-IEEE Transactions on Automatic Control

TL;DR: A class of nonlinear stochastic systems driven by Wiener and Poisson processes is considered, which involves either maximizing the time of stay within an admissible set or a closely related performance measure.

...read moreread less

Abstract: In this note, we consider a class of nonlinear stochastic systems driven by Wiener and Poisson processes. The Wiener process input enters into the equations additively to the dynamics while the Poisson process input enters into the equations multiplicatively to the control input. Examples of applied problems that may lead to system models of this kind are discussed in the note. The optimal containment control problem is then formulated for these systems. It involves either maximizing the time of stay within an admissible set or a closely related performance measure. The optimal control and the optimal value function are characterized on the basis of Bellman's dynamic programming principle in the general case so that the optimal value function is a solution of a boundary value problem for a partial differential equation (PDE). For a special case defined by more restrictive assumptions the method of successive approximations is used to show the existence of solution to this boundary value problem and to set up an iterative solution procedure. An example is reported that illustrates the results.

...read moreread less

Journal Article•DOI•

Zero-Sum Semi-Markov Games

[...]

Anna Jaskiewicz

01 Mar 2002-Siam Journal on Control and Optimization

TL;DR: The main result states that the optimality equation has a solution, which is approximated by the solutions of some $\varepsilon$-perturbed semi-Markov games, and the existence of value and average optimal strategies for the players is established.

...read moreread less

Abstract: This paper deals with Borel state and action spaces zero-sum semi-Markov games under the expected long run average payoff criterion. The transition probabilities are assumed to satisfy some generalized geometric ergodicity conditions. The main result states that the optimality equation has a solution, which is approximated by the solutions of some $\varepsilon$-perturbed semi-Markov games. As a corollary, the existence of value and average optimal strategies for the players is established.

...read moreread less