Showing papers on "Bellman equation published in 2009"

PDF

Open Access

Journal Article•DOI•

Shortest path problem considering on-time arrival probability

[...]

Yu (Marco) Nie¹, Xing Wu¹•Institutions (1)

01 Jul 2009-Transportation Research Part B-methodological

TL;DR: It is shown that the time-dependent problem is decomposable with respect to arrival times and therefore can be solved as easily as its static counterpart.

...read moreread less

Abstract: This paper studies the problem of finding a priori shortest paths to guarantee a given likelihood of arriving on-time in a stochastic network. Such “reliable” paths help travelers better plan their trips to prepare for the risk of running late in the face of stochastic travel times. Optimal solutions to the problem can be obtained from local-reliable paths, which are a set of non-dominated paths under first-order stochastic dominance. We show that Bellman’s principle of optimality can be applied to construct local-reliable paths. Acyclicity of local-reliable paths is established and used for proving finite convergence of solution procedures. The connection between the a priori path problem and the corresponding adaptive routing problem is also revealed. A label-correcting algorithm is proposed and its complexity is analyzed. A pseudo-polynomial approximation is proposed based on extreme-dominance. An extension that allows travel time distribution functions to vary over time is also discussed. We show that the time-dependent problem is decomposable with respect to arrival times and therefore can be solved as easily as its static counterpart. Numerical results are provided using typical transportation networks.

...read moreread less

305 citations

Proceedings Article•

Convergent Temporal-Difference Learning with Arbitrary Smooth Function Approximation

[...]

Shalabh Bhatnagar¹, Doina Precup², David Silver³, Richard S. Sutton³, Hamid Reza Maei³, Csaba Szepesvári³ - Show less +2 more•Institutions (3)

Indian Institute of Science¹, McGill University², University of Alberta³

07 Dec 2009

TL;DR: This work presents a Bellman error objective function and two gradient-descent TD algorithms that optimize it, and proves the asymptotic almost-sure convergence of both algorithms, for any finite Markov decision process and any smooth value function approximator, to a locally optimal solution.

...read moreread less

Abstract: We introduce the first temporal-difference learning algorithms that converge with smooth value function approximators, such as neural networks. Conventional temporal-difference (TD) methods, such as TD(λ), Q-learning and Sarsa have been used successfully with function approximation in many applications. However, it is well known that off-policy sampling, as well as nonlinear function approximation, can cause these algorithms to become unstable (i.e., the parameters of the approximator may diverge). Sutton et al. (2009a, 2009b) solved the problem of off-policy learning with linear TD algorithms by introducing a new objective function, related to the Bellman error, and algorithms that perform stochastic gradient-descent on this function. These methods can be viewed as natural generalizations to previous TD methods, as they converge to the same limit points when used with linear function approximation methods. We generalize this work to nonlinear function approximation. We present a Bellman error objective function and two gradient-descent TD algorithms that optimize it. We prove the asymptotic almost-sure convergence of both algorithms, for any finite Markov decision process and any smooth value function approximator, to a locally optimal solution. The algorithms are incremental and the computational complexity per time step scales linearly with the number of parameters of the approximator. Empirical results obtained in the game of Go demonstrate the algorithms' effectiveness.

...read moreread less

249 citations

Journal Article•DOI•

An Approximate Dynamic Programming Approach to Network Revenue Management with Customer Choice

[...]

Dan Zhang¹, Daniel Adelman²•Institutions (2)

Desautels Faculty of Management¹, University of Chicago²

01 Aug 2009-Transportation Science

TL;DR: This work develops a column generation algorithm to solve the problem for a multinomial logit choice model with disjoint consideration sets (MNLD), and derives a bound as a by-product of a decomposition heuristic.

...read moreread less

Abstract: We consider a network revenue management problem where customers choose among open fare products according to some prespecified choice model. Starting with a Markov decision process (MDP) formulation, we approximate the value function with an affine function of the state vector. We show that the resulting problem provides a tighter bound for the MDP value than the choice-based linear program. We develop a column generation algorithm to solve the problem for a multinomial logit choice model with disjoint consideration sets (MNLD). We also derive a bound as a by-product of a decomposition heuristic. Our numerical study shows the policies from our solution approach can significantly outperform heuristics from the choice-based linear program.

...read moreread less

223 citations

Journal Article•DOI•

Gaussian process dynamic programming

[...]

Marc Peter Deisenroth¹, Carl Edward Rasmussen¹, Jan Peters²•Institutions (2)

University of Cambridge¹, University of Southern California²

01 Mar 2009-Neurocomputing

TL;DR: This article introduces Gaussian process dynamic programming (GPDP), an approximate value function-based RL algorithm, and proposes to learn probabilistic models of the a priori unknown transition dynamics and the value functions on the fly.

...read moreread less

222 citations

Journal Article•DOI•

2009 Special Issue: Optimal control of unknown affine nonlinear discrete-time systems using offline-trained neural networks with proof of convergence

[...]

Travis Dierks¹, Balaje T. Thumati¹, Sarangapani Jagannathan¹•Institutions (1)

Missouri University of Science and Technology¹

01 Jul 2009-Neural Networks

TL;DR: In this article, a neural network is tuned online using novel tuning laws to learn the complete plant dynamics so that a local asymptotic stability of the identification error can be shown.

...read moreread less

176 citations

Proceedings Article•

Optimal control of unknown affine nonlinear discrete-time systems using offline-trained neural networks with proof of convergence

[...]

Travis Dierks¹, Balaje T. Thumati¹, Sarangapani Jagannathan¹•Institutions (1)

Missouri University of Science and Technology¹

01 Jan 2009

TL;DR: The need of the partial knowledge of the nonlinear system dynamics is relaxed in the development of a novel approach to ADP using a two part process: online system identification and offline optimal control training.

...read moreread less

Abstract: The optimal control of linear systems accompanied by quadratic cost functions can be achieved by solving the well-known Riccati equation. However, the optimal control of nonlinear discrete-time systems is a much more challenging task that often requires solving the nonlinear Hamilton―Jacobi―Bellman (HJB) equation. In the recent literature, discrete-time approximate dynamic programming (ADP) techniques have been widely used to determine the optimal or near optimal control policies for affine nonlinear discrete-time systems. However, an inherent assumption of ADP requires the value of the controlled system one step ahead and at least partial knowledge of the system dynamics to be known. In this work, the need of the partial knowledge of the nonlinear system dynamics is relaxed in the development of a novel approach to ADP using a two part process: online system identification and offline optimal control training. First, in the system identification process, a neural network (NN) is tuned online using novel tuning laws to learn the complete plant dynamics so that a local asymptotic stability of the identification error can be shown. Then, using only the learned NN system model, offline ADP is attempted resulting in a novel optimal control law. The proposed scheme does not require explicit knowledge of the system dynamics as only the learned NN model is needed. The proof of convergence is demonstrated. Simulation results verify theoretical conjecture.

...read moreread less

131 citations

Journal Article•DOI•

Stochastic Target Problems with Controlled Loss

[...]

Bruno Bouchard¹, Romuald Elie², Nizar Touzi³•Institutions (3)

CEREMADE¹, Paris Dauphine University², Chicago Metropolitan Agency for Planning³

01 Nov 2009-Siam Journal on Control and Optimization

TL;DR: A new derivation of the dynamic programming equation for general stochastic target problems with unbounded controls is provided, together with the appropriate boundary conditions, which are applied to the problem of quantile hedging in financial mathematics.

...read moreread less

Abstract: We consider the problem of finding the minimal initial data of a controlled process which guarantees to reach a controlled target with a given probability of success or, more generally, with a given level of expected loss. By suitably increasing the state space and the controls, we show that this problem can be converted into a stochastic target problem, i.e., finding the minimal initial data of a controlled process which guarantees to reach a controlled target with probability one. Unlike in the existing literature on stochastic target problems, our increased controls are valued in an unbounded set. In this paper, we provide a new derivation of the dynamic programming equation for general stochastic target problems with unbounded controls, together with the appropriate boundary conditions. These results are applied to the problem of quantile hedging in financial mathematics and are shown to recover the explicit solution of Follmer and Leukert [Finance Stoch., 3 (1999), pp. 251-273].

...read moreread less

127 citations

Posted Content•

Hamilton-Jacobi formulation for reach-avoid differential games

[...]

Kostas Margellos¹, John Lygeros¹•Institutions (1)

ETH Zurich¹

24 Nov 2009-arXiv: Optimization and Control

TL;DR: In this article, a new framework for formulating reachability problems with competing inputs, nonlinear dynamics and state constraints as optimal control problems is developed, which can be applied to a general class of target hitting continuous dynamic games with non-linear dynamics, and has very good properties in terms of its numerical solution.

...read moreread less

Abstract: A new framework for formulating reachability problems with competing inputs, nonlinear dynamics and state constraints as optimal control problems is developed. Such reach-avoid problems arise in, among others, the study of safety problems in hybrid systems. Earlier approaches to reach-avoid computations are either restricted to linear systems, or face numerical difficulties due to possible discontinuities in the Hamiltonian of the optimal control problem. The main advantage of the approach proposed in this paper is that it can be applied to a general class of target hitting continuous dynamic games with nonlinear dynamics, and has very good properties in terms of its numerical solution, since the value function and the Hamiltonian of the system are both continuous. The performance of the proposed method is demonstrated by applying it to a two aircraft collision avoidance scenario under target window constraints and in the presence of wind disturbance. Target Windows are a novel concept in air traffic management, and represent spatial and temporal constraints, that the aircraft have to respect to meet their schedule.

...read moreread less

114 citations

Proceedings Article•

Compositionality of optimal control laws

[...]

Emanuel Todorov¹•Institutions (1)

University of Washington¹

07 Dec 2009

TL;DR: A theory of compositionality in stochastic optimal control is presented, showing how task-optimal controllers can be constructed from certain primitives, and illustrating the theory in the context of human arm movements.

...read moreread less

Abstract: We present a theory of compositionality in stochastic optimal control, showing how task-optimal controllers can be constructed from certain primitives. The primitives are themselves feedback controllers pursuing their own agendas. They are mixed in proportion to how much progress they are making towards their agendas and how compatible their agendas are with the present task. The resulting composite control law is provably optimal when the problem belongs to a certain class. This class is rather general and yet has a number of unique properties - one of which is that the Bellman equation can be made linear even for non-linear or discrete dynamics. This gives rise to the compositionality developed here. In the special case of linear dynamics and Gaussian noise our framework yields analytical solutions (i.e. non-linear mixtures of LQG controllers) without requiring the final cost to be quadratic. More generally, a natural set of control primitives can be constructed by applying SVD to Green's function of the Bellman equation. We illustrate the theory in the context of human arm movements. The ideas of optimality and compositionality are both very prominent in the field of motor control, yet they have been difficult to reconcile. Our work makes this possible.

...read moreread less

113 citations

Journal Article•DOI•

On the Value Functions of the Discrete-Time Switched LQR Problem

[...]

Wei Zhang¹, Jianghai Hu¹, Alessandro Abate²•Institutions (2)

Purdue University¹, Delft University of Technology²

16 Oct 2009-IEEE Transactions on Automatic Control

TL;DR: It is proved that any finite-horizon value function of the DSLQR problem is the pointwise minimum of a finite number of quadratic functions that can be obtained recursively using the so-called switched Riccati mapping.

...read moreread less

Abstract: In this paper, we derive some important properties for the finite-horizon and the infinite-horizon value functions associated with the discrete-time switched LQR (DSLQR) problem. It is proved that any finite-horizon value function of the DSLQR problem is the pointwise minimum of a finite number of quadratic functions that can be obtained recursively using the so-called switched Riccati mapping. It is also shown that under some mild conditions, the family of the finite-horizon value functions is homogeneous (of degree 2), is uniformly bounded over the unit ball, and converges exponentially fast to the infinite-horizon value function. The exponential convergence rate of the value iterations is characterized analytically in terms of the subsystem matrices.

...read moreread less

101 citations

Journal Article•DOI•

Finite-horizon optimal investment with transaction costs: A parabolic double obstacle problem

[...]

Min Dai¹, Fahuai Yi²•Institutions (2)

National University of Singapore¹, South China Normal University²

15 Feb 2009-Journal of Differential Equations

TL;DR: In this paper, the authors show that the problem is equivalent to a parabolic double obstacle problem involving two free boundaries that correspond to the optimal buying and selling policies, and the C 2, 1 regularity of the value function is proven.

...read moreread less

Journal Article•DOI•

Utility Maximization with Habit Formation: Dynamic Programming and Stochastic PDEs

[...]

Nikolaos Englezos, Ioannis Karatzas

01 Mar 2009-Siam Journal on Control and Optimization

TL;DR: The effective state space of the corresponding optimal wealth and standard of living processes is described, the associated value function is identified as a generalized utility function, and the interplay between dynamic programming and Feynman-Kac results is exploited via the theory of random fields and stochastic partial differential equations.

...read moreread less

Abstract: This paper studies the habit-forming preference problem of maximizing total expected utility from consumption net of the standard of living, a weighted average of past consumption. We describe the effective state space of the corresponding optimal wealth and standard of living processes, identify the associated value function as a generalized utility function, and exploit the interplay between dynamic programming and Feynman-Kac results via the theory of random fields and stochastic partial differential equations (SPDEs). The resulting value random field of the optimization problem satisfies a nonlinear, backward SPDE of parabolic type, widely referred to as the stochastic Hamilton-Jacobi-Bellman equation. The dual value random field is characterized further in terms of a backward parabolic SPDE which is linear. Progressively measurable versions of stochastic feedback formulae for the optimal portfolio and consumption choices are obtained as well.

...read moreread less

Book Chapter•DOI•

Reliable a Priori Shortest Path Problem with Limited Spatial and Temporal Dependencies

[...]

Yu Marco Nie¹, Xing Wu¹•Institutions (1)

Northwestern University¹

01 Jan 2009

TL;DR: It is proved that a non-dominated path should contain no cycles if random link travel times are consistent with the stochastic first-in-first-out principle, and it is shown that the optimal solution is a set of non- dominated paths under the first-order stochastically dominance.

...read moreread less

Abstract: This paper studies the problem of finding most reliable a priori shortest paths (RASP) in a stochastic and time-dependent network. Correlations are modeled by assuming the probability density functions of link traversal times to be conditional on both the time of day and link states. Such correlations are spatially limited by the Markovian property of the link states, which may be such defined to reflect congestion levels or the intensity of random disruptions. We formulate the RASP problem with the above correlation structure as a general dynamic programming problem, and show that the optimal solution is a set of non-dominated paths under the first-order stochastic dominance. Conditions are proposed to regulate the transition probabilities of link states such that Bellman’s principle of optimality can be utilized. We prove that a non-dominated path should contain no cycles if random link travel times are consistent with the stochastic first-in-first-out principle. The RASP problem is solved using a non-deterministic polynomial label correcting algorithm. Approximation algorithms with polynomial complexity may be achieved when further assumptions are made to the correlation structure and to the applicability of dynamic programming. Numerical results are provided.

...read moreread less

Journal Article•DOI•

An Optimal Approximate Dynamic Programming Algorithm for the Lagged Asset Acquisition Problem

[...]

Juliana Nascimento¹, Warren B. Powell¹•Institutions (1)

Princeton University¹

01 Feb 2009-Mathematics of Operations Research

TL;DR: This work provides a rare proof of convergence for an approximate dynamic programming algorithm using pure exploitation, where the states the authors visit depend on the decisions produced by solving the approximate problem.

...read moreread less

Abstract: We consider a multistage asset acquisition problem where assets are purchased now, at a price that varies randomly over time, to be used to satisfy a random demand at a particular point in time in the future. We provide a rare proof of convergence for an approximate dynamic programming algorithm using pure exploitation, where the states we visit depend on the decisions produced by solving the approximate problem. The resulting algorithm does not require knowing the probability distribution of prices or demands, nor does it require any assumptions about its functional form. The algorithm and its proof rely on the fact that the true value function is a family of piecewise linear concave functions.

...read moreread less

Journal Article•DOI•

Utility Maximization in a jump market model

[...]

Marie-Amelie Morlais¹•Institutions (1)

ETH Zurich¹

28 Jan 2009-Stochastics An International Journal of Probability and Stochastic Processes

TL;DR: In this paper, the authors consider the problem of maximizing the utility of a financial market allowing jumps and prove existence and uniqueness results for the introduced BSDE, which allows to give the expression of the value function and characterize optimal strategies for the problem.

...read moreread less

Abstract: In this paper, we consider the classical problem of utility maximization in a financial market allowing jumps. Assuming that the constraint set of all trading strategies is a compact set, rather than a convex one, we use a dynamic method from which we derive a specific BSDE. To solve the financial problem, we first prove existence and uniqueness results for the introduced BSDE. This allows to give the expression of the value function and characterize optimal strategies for the problem.

...read moreread less

Journal Article•DOI•

A game theoretic algorithm to compute local stabilizing solutions to HJBI equations in nonlinear H∞ control

[...]

Yantao Feng¹, Brian D. O. Anderson¹, Michael Rotkowitz²•Institutions (2)

Australian National University¹, University of Melbourne²

01 Apr 2009-Automatica

TL;DR: An iterative algorithm to solve Hamilton-Jacobi-Bellman-Isaacs (HJBI) equations for a broad class of nonlinear control systems is proposed by constructing two series of nonnegative functions whose solutions can be approximated recursively by existing methods.

...read moreread less

Journal Article•DOI•

Optimal impulse control for a multidimensional cash management system with generalized cost functions

[...]

Stefano Baccarin¹•Institutions (1)

University of Turin¹

01 Jul 2009-European Journal of Operational Research

TL;DR: This work considers the optimal control of a multidimensional cash management system where the cash balances fluctuate as a homogeneous diffusion process in and compute the solution in two-dimensions with linear and distance cost functions.

...read moreread less

Journal Article•DOI•

Generalized thermo-visco-elastic problem of a spherical shell with three-phase-lag effect

[...]

Avijit Kar¹, Mridula Kanoria¹•Institutions (1)

University of Calcutta¹

01 Aug 2009-Applied Mathematical Modelling

TL;DR: In this paper, the inverse of the transformed solution is carried out by applying a method of Bellman et al. using the Laplace transformation and the fundamental equations have been expressed in the form of vector-matrix differential equation which is then solved by eigen value approach.

...read moreread less

Journal Article•DOI•

On Controlled Linear Diffusions with Delay in a Model of Optimal Advertising under Uncertainty with Memory Effects

[...]

Fausto Gozzi¹, Carlo Marinelli², Sergei Savin³•Institutions (3)

Libera!¹, University of Bonn², Columbia University³

14 Mar 2009-Journal of Optimization Theory and Applications

TL;DR: A class of dynamic advertising problems under uncertainty in the presence of carryover and distributed forgetting effects, generalizing the classical model of Nerlove and Arrow is considered, allowing the dynamics of the product goodwill to depend on its past values, as well as previous advertising levels.

...read moreread less

Abstract: We consider a class of dynamic advertising problems under uncertainty in the presence of carryover and distributed forgetting effects, generalizing the classical model of Nerlove and Arrow (Economica 29:129–142, 1962). In particular, we allow the dynamics of the product goodwill to depend on its past values, as well as previous advertising levels. Building on previous work (Gozzi and Marinelli in Lect. Notes Pure Appl. Math., vol. 245, pp. 133–148, 2006), the optimal advertising model is formulated as an infinite-dimensional stochastic control problem. We obtain (partial) regularity as well as approximation results for the corresponding value function. Under specific structural assumptions, we study the effects of delays on the value function and optimal strategy. In the absence of carryover effects, since the value function and the optimal advertising policy can be characterized in terms of the solution of the associated HJB equation, we obtain sharper characterizations of the optimal policy.

...read moreread less

Journal Article•DOI•

Stochastic optimization theory of backward stochastic differential equations with jumps and viscosity solutions of Hamilton–Jacobi–Bellman equations

[...]

Juan Li¹, Juan Li², Shige Peng²•Institutions (2)

Fudan University¹, Shandong University²

15 Feb 2009-Nonlinear Analysis-theory Methods & Applications

TL;DR: Yan et al. as discussed by the authors studied stochastic optimal control problems with jumps with the help of the theory of Backward Stochastic Differential Equations (BSDEs) with jumps and proved that the value functions are the viscosity solutions of the associated generalized Hamilton-Jacobi-Bellman equations with integral differential operators.

...read moreread less

Abstract: In this paper we study stochastic optimal control problems with jumps with the help of the theory of Backward Stochastic Differential Equations (BSDEs) with jumps. We generalize the results of Peng [S. Peng, BSDE and stochastic optimizations, in: J. Yan, S. Peng, S. Fang, L. Wu, Topics in Stochastic Analysis, Science Press, Beijing, 1997 (Chapter 2) (in Chinese)] by considering cost functionals defined by controlled BSDEs with jumps. The application of BSDE methods, in particular, the use of the notion of stochastic backward semigroups introduced by Peng in the above-mentioned work allows a straightforward proof of a dynamic programming principle for value functions associated with stochastic optimal control problems with jumps. We prove that the value functions are the viscosity solutions of the associated generalized Hamilton–Jacobi–Bellman equations with integral-differential operators. For this proof, we adapt Peng’s BSDE approach, given in the above-mentioned reference, developed in the framework of stochastic control problems driven by Brownian motion to that of stochastic control problems driven by Brownian motion and Poisson random measure.

...read moreread less

Journal Article•DOI•

Impulse control problem on finite horizon with execution delay

[...]

Benjamin Bruder¹, Huyen Pham¹, Huyen Pham²•Institutions (2)

University of Paris¹, Institut Universitaire de France²

01 May 2009-Stochastic Processes and their Applications

TL;DR: The general framework deals with the important case when several consecutive orders may be decided before the effective execution of the first one, motivated by financial applications in the trading of illiquid assets such as hedge funds.

...read moreread less

Journal Article•DOI•

Optimal investment strategy to minimize the ruin probability of an insurance company under borrowing constraints

[...]

Pablo Azcue¹, Nora Muler¹•Institutions (1)

Torcuato di Tella University¹

01 Feb 2009-Insurance Mathematics & Economics

TL;DR: In this article, the authors consider the case where the risk is a stock whose price process is a geometric Brownian motion and find a dynamic choice of the investment policy which minimizes the ruin probability of the company.

...read moreread less

Abstract: We consider that the surplus of an insurance company follows a Cramer–Lundberg process. The management has the possibility of investing part of the surplus in a risky asset. We consider that the risky asset is a stock whose price process is a geometric Brownian motion. Our aim is to find a dynamic choice of the investment policy which minimizes the ruin probability of the company. We impose that the ratio between the amount invested in the risky asset and the surplus should be smaller than a given positive bound a . For instance the case a = 1 means that the management cannot borrow money to buy stocks. [Hipp, C., Plum, M., 2000. Optimal investment for insurers. Insurance: Mathematics and Economics 27, 215–228] and [Schmidli, H., 2002. On minimizing the ruin probability by investment and reinsurance. Ann. Appl. Probab. 12, 890–907] solved this problem without borrowing constraints. They found that the ratio between the amount invested in the risky asset and the surplus goes to infinity as the surplus approaches zero, so the optimal strategies of the constrained and unconstrained problems never coincide. We characterize the optimal value function as the classical solution of the associated Hamilton–Jacobi–Bellman equation. This equation is a second-order non-linear integro-differential equation. We obtain numerical solutions for some claim-size distributions and compare our results with those of the unconstrained case.

...read moreread less

Journal Article•DOI•

The Dynamic Programming Equation for Second Order Stochastic Target Problems

[...]

H. Mete Soner, Nizar Touzi

01 Jun 2009-Siam Journal on Control and Optimization

TL;DR: The dynamic programming equation for a certain class of problems which is called the second order stochastic target problems is derived, and it is proved by using the framework developed in H. Soner and N. Touzi (2002).

...read moreread less

Abstract: Motivated by applications in mathematical finance [U. Cetin, H. M. Soner, and N. Touzi, “Options hedging for small investors under liquidity costs,” Finance Stoch., to appear] we continue our study of second order backward stochastic equations. In this paper, we derive the dynamic programming equation for a certain class of problems which we call the second order stochastic target problems. In contrast with previous formulations of similar problems, we restrict control processes to be continuous. This new framework enables us to apply our results to a larger class of models. Also the resulting derivation is more transparent. The main technical tool is the geometric dynamic programming principle in this context, and it is proved by using the framework developed in [H. M. Soner and N. Touzi, J. Eur. Math. Soc. (JEMS), 8 (2002), pp. 201-236].

...read moreread less

Proceedings Article•DOI•

Iterative local dynamic programming

[...]

Emanuel Todorov¹, Yuval Tassa²•Institutions (2)

University of California, San Diego¹, Hebrew University of Jerusalem²

15 May 2009

TL;DR: iLDP can be considered a generalization of Differential Dynamic Programming, inasmuch as it uses general basis functions rather than quadratics to approximate the optimal value function and introduces a collocation method that dispenses with explicit differentiation of the cost and dynamics.

...read moreread less

Abstract: We develop an iterative local dynamic programming method (iLDP) applicable to stochastic optimal control problems in continuous high-dimensional state and action spaces. Such problems are common in the control of biological movement, but cannot be handled by existing methods. iLDP can be considered a generalization of Differential Dynamic Programming, inasmuch as: (a) we use general basis functions rather than quadratics to approximate the optimal value function; (b) we introduce a collocation method that dispenses with explicit differentiation of the cost and dynamics and ties iLDP to the Unscented Kalman filter; (c) we adapt the local function approximator to the propagated state covariance, thus increasing accuracy at more likely states. Convergence is similar to quasi-Netwon methods. We illustrate iLDP on several problems including the “swimmer” dynamical system which has 14 state and 4 control variables.

...read moreread less

Journal Article•DOI•

Finite Horizon Optimal Investment and Consumption with Transaction Costs

[...]

Min Dai, Lishang Jiang¹, Peifan Li, Fahuai Yi•Institutions (1)

Yahoo!¹

01 Mar 2009-Siam Journal on Control and Optimization

TL;DR: In this paper, a continuous-time optimal investment and the consumption decision of a constant relative risk aversion (CRRA) investor who faces proportional transaction costs and a finite time horizon were studied.

...read moreread less

Abstract: This paper concerns continuous-time optimal investment and the consumption decision of a constant relative risk aversion (CRRA) investor who faces proportional transaction costs and a finite time horizon. In the no-consumption case, it has been studied by Liu and Loewenstein [Review of Financial Studies, 15 (2002), pp. 805-835] and Dai and Yi [J. Differential Equations, 246 (2009), pp. 1445-1469]. Mathematically, it is a singular stochastic control problem whose value function satisfies a parabolic variational inequality with gradient constraints. The problem gives rise to two free boundaries which stand for the optimal buying and selling strategies, respectively. We present an analytical approach to analyze the behaviors of free boundaries. The regularity of the value function is studied as well. Our approach is essentially based on the connection between singular control and optimal stopping, which is first revealed in the present problem.

...read moreread less

Journal Article•DOI•

Optimal and robust epidemic response for multiple networks

[...]

Michael Bloem¹, Tansu Alpcan², M Tamer Basar³•Institutions (3)

Ames Research Center¹, Deutsche Telekom², University of Illinois at Urbana–Champaign³

01 May 2009-Control Engineering Practice

TL;DR: The classical epidemic model is adapted to model malware propagation in this multi-network framework and the trade-off between the infection spread and the patching costs is captured in a cost function, leading to an optimal control problem.

...read moreread less

Journal Article•DOI•

Fixed final time optimal control approach for bounded robust controller design using Hamilton-Jacobi-Bellman solution

[...]

Dipak M. Adhyaru¹, Indra Narayan Kar¹, M. Gopal¹•Institutions (1)

Indian Institute of Technology Delhi¹

11 Sep 2009-Iet Control Theory and Applications

TL;DR: In this article, an optimal control algorithm based on Hamilton-Jacobi-Bellman (HJB) equation, for the bounded robust controller design for finite-time-horizon nonlinear systems, is proposed.

...read moreread less

Abstract: In this study, an optimal control algorithm based on Hamilton-Jacobi-Bellman (HJB) equation, for the bounded robust controller design for finite-time-horizon nonlinear systems, is proposed. The HJB equation formulated using a suitable nonquadratic term in the performance functional to take care of magnitude constraints on the control input. Utilising the direct method of Lyapunov stability, we have proved the optimality of the controller with respect to a cost functional, that includes penalty on the control effort and the maximum bound on system uncertainty. The bounded controller requires the knowledge of the upper bound of system uncertainty. In the proposed algorithm, neural network is used to approximate the time-varying solution of HJB equation using least squares method. Proposed algorithm has been applied on the nonlinear system with matched and unmatched system uncertainties. Necessary theoretical and simulation results are presented to validate proposed algorithm.

...read moreread less

Journal Article•DOI•

MDP Algorithms for Portfolio Optimization Problems in pure Jump Markets

[...]

Nicole Bäuerle¹, Ulrich Rieder²•Institutions (2)

Karlsruhe Institute of Technology¹, University of Ulm²

11 Jul 2009-Finance and Stochastics

TL;DR: It is shown that value iteration as well as Howard’s policy improvement algorithm works and error bounds are given when the utility function is approximated and when the state space is discretized.

...read moreread less

Abstract: We consider the problem of maximizing the expected utility of the terminal wealth of a portfolio in a continuous-time pure jump market with general utility function. This leads to an optimal control problem for piecewise deterministic Markov processes. Using an embedding procedure we solve the problem by looking at a discrete-time contracting Markov decision process. Our aim is to show that this point of view has a number of advantages, in particular as far as computational aspects are concerned. We characterize the value function as the unique fixed point of the dynamic programming operator and prove the existence of optimal portfolios. Moreover, we show that value iteration as well as Howard’s policy improvement algorithm works. Finally, we give error bounds when the utility function is approximated and when we discretize the state space. A numerical example is presented and our approach is compared to the approximating Markov chain method.

...read moreread less

Journal Article•DOI•

Dynamic programming and Lagrange multipliers for active relaxation of resources in nonlinear non-equilibrium systems

[...]

Stanislaw Sieniutycz¹•Institutions (1)

Warsaw University of Technology¹

01 Mar 2009-Applied Mathematical Modelling

TL;DR: In this paper various mathematical tools are applied in dynamic optimization of power-maximizing paths, with special attention paid to nonlinear systems, and convergence of discrete algorithms to viscosity solutions of HJB equations, discrete approximations and the role of Lagrange multiplier λ associated with the duration constraint is considered.

...read moreread less

Journal Article•DOI•

Brief paper: Hamilton-Jacobi-Bellman formalism for optimal climate control of greenhouse crop

[...]

Ilya Ioslovich¹, Per-Olof Gutman¹, Raphael Linker¹•Institutions (1)

Technion – Israel Institute of Technology¹

01 May 2009-Automatica

TL;DR: A HJB formalism is used and the explicit form of the Krotov-Bellman function is obtained for different growth stages and the optimal control problem related to the seasonal benefit of the grower is described.

...read moreread less

Collapse