Showing papers on "Bellman equation published in 2008"

PDF

Open Access

Journal Article•DOI•

Discrete-Time Nonlinear HJB Solution Using Approximate Dynamic Programming: Convergence Proof

[...]

A. Al-Tamimi¹, Frank L. Lewis², Murad Abu-Khalaf³•Institutions (3)

Hashemite University¹, University of Texas at Arlington², MathWorks³

01 Aug 2008

TL;DR: It is shown that HDP converges to the optimal control and the optimal value function that solves the Hamilton-Jacobi-Bellman equation appearing in infinite-horizon discrete-time (DT) nonlinear optimal control.

...read moreread less

Abstract: Convergence of the value-iteration-based heuristic dynamic programming (HDP) algorithm is proven in the case of general nonlinear systems. That is, it is shown that HDP converges to the optimal control and the optimal value function that solves the Hamilton-Jacobi-Bellman equation appearing in infinite-horizon discrete-time (DT) nonlinear optimal control. It is assumed that, at each iteration, the value and action update equations can be exactly solved. The following two standard neural networks (NN) are used: a critic NN is used to approximate the value function, whereas an action network is used to approximate the optimal control policy. It is stressed that this approach allows the implementation of HDP without knowing the internal dynamics of the system. The exact solution assumption holds for some classes of nonlinear systems and, specifically, in the specific case of the DT linear quadratic regulator (LQR), where the action is linear and the value quadratic in the states and NNs have zero approximation error. It is stressed that, for the LQR, HDP may be implemented without knowing the system A matrix by using two NNs. This fact is not generally appreciated in the folklore of HDP for the DT LQR, where only one critic NN is generally used.

...read moreread less

919 citations

Proceedings Article•DOI•

General duality between optimal control and estimation

[...]

Emanuel Todorov¹•Institutions (1)

University of California, San Diego¹

01 Dec 2008

TL;DR: This work obtains a more natural form of LQG duality by replacing the Kalman-Bucy filter with the information filter and generalizes this result to non-linear stochastic systems, discrete stochastics systems, and deterministic systems.

...read moreread less

Abstract: Optimal control and estimation are dual in the LQG setting, as Kalman discovered, however this duality has proven difficult to extend beyond LQG. Here we obtain a more natural form of LQG duality by replacing the Kalman-Bucy filter with the information filter. We then generalize this result to non-linear stochastic systems, discrete stochastic systems, and deterministic systems. All forms of duality are established by relating exponentiated costs to probabilities. Unlike the LQG setting where control and estimation are in one-to-one correspondence, in the general case control turns out to be a larger problem class than estimation and only a sub-class of control problems have estimation duals. These are problems where the Bellman equation is intrinsically linear. Apart from their theoretical significance, our results make it possible to apply estimation algorithms to control problems and vice versa.

...read moreread less

312 citations

Journal Article•DOI•

Stochastic Differential Games and Viscosity Solutions of Hamilton-Jacobi-Bellman-Isaacs Equations

[...]

Rainer Buckdahn¹, Juan Li²•Institutions (2)

University of Western Brittany¹, Shandong University²

01 Jan 2008-Siam Journal on Control and Optimization

TL;DR: P Peng's BSDE method is extended from the framework of stochastic control theory into that of Stochastic differential games and is shown to prove a dynamic programming principle for both the upper and the lower value functions of the game in a straightforward way.

...read moreread less

Abstract: In this paper we study zero-sum two-player stochastic differential games with the help of the theory of backward stochastic differential equations (BSDEs). More precisely, we generalize the results of the pioneering work of Fleming and Souganidis [Indiana Univ. Math. J., 38 (1989), pp. 293-314] by considering cost functionals defined by controlled BSDEs and by allowing the admissible control processes to depend on events occurring before the beginning of the game. This extension of the class of admissible control processes has the consequence that the cost functionals become random variables. However, by making use of a Girsanov transformation argument, which is new in this context, we prove that the upper and the lower value functions of the game remain deterministic. Apart from the fact that this extension of the class of admissible control processes is quite natural and reflects the behavior of the players who always use the maximum of available information, its combination with BSDE methods, in particular that of the notion of stochastic “backward semigroups" introduced by Peng [BSDE and stochastic optimizations, in Topics in Stochastic Analysis, Science Press, Beijing, 1997], allows us then to prove a dynamic programming principle for both the upper and the lower value functions of the game in a straightforward way. The upper and the lower value functions are then shown to be the unique viscosity solutions of the upper and the lower Hamilton-Jacobi-Bellman-Isaacs equations, respectively. For this Peng's BSDE method is extended from the framework of stochastic control theory into that of stochastic differential games.

...read moreread less

268 citations

Journal Article•DOI•

Learning near-optimal policies with Bellman-residual minimization based fitted policy iteration and a single sample path

[...]

András Antos¹, Csaba Szepesvári¹, Rémi Munos²•Institutions (2)

Hungarian Academy of Sciences¹, French Institute for Research in Computer Science and Automation²

01 Apr 2008-Machine Learning

TL;DR: In this article, the authors consider the problem of finding a near-optimal policy in a continuous space, discounted Markovian Decision Problem (MDP) by employing value-function-based methods when only a single trajectory of a fixed policy is available as the input.

...read moreread less

Abstract: In this paper we consider the problem of finding a near-optimal policy in a continuous space, discounted Markovian Decision Problem (MDP) by employing value-function-based methods when only a single trajectory of a fixed policy is available as the input. We study a policy-iteration algorithm where the iterates are obtained via empirical risk minimization with a risk function that penalizes high magnitudes of the Bellman-residual. Our main result is a finite-sample, high-probability bound on the performance of the computed policy that depends on the mixing rate of the trajectory, the capacity of the function set as measured by a novel capacity concept (the VC-crossing dimension), the approximation power of the function set and the controllability properties of the MDP. Moreover, we prove that when a linear parameterization is used the new algorithm is equivalent to Least-Squares Policy Iteration. To the best of our knowledge this is the first theoretical result for off-policy control learning over continuous state-spaces using a single trajectory.

...read moreread less

231 citations

Proceedings Article•DOI•

An analysis of linear models, linear value-function approximation, and feature selection for reinforcement learning

[...]

Ronald Parr¹, Lihong Li², Gavin Taylor¹, Christopher Painter-Wakefield¹, Michael L. Littman² - Show less +1 more•Institutions (2)

Duke University¹, Rutgers University²

05 Jul 2008

TL;DR: It is shown that linear value-function approximation is equivalent to a form of linear model approximation, and a relationship between the model-approximation error and the Bellman error is derived, which can guide feature selection for model improvement and/or value- function improvement.

...read moreread less

Abstract: We show that linear value-function approximation is equivalent to a form of linear model approximation. We then derive a relationship between the model-approximation error and the Bellman error, and show how this relationship can guide feature selection for model improvement and/or value-function improvement. We also show how these results give insight into the behavior of existing feature-selection algorithms.

...read moreread less

198 citations

Journal Article•DOI•

Relaxations of Weakly Coupled Stochastic Dynamic Programs

[...]

Daniel Adelman¹, Adam J. Mersereau²•Institutions (2)

University of Chicago¹, University of North Carolina at Chapel Hill²

01 May 2008-Operations Research

TL;DR: A broad class of stochastic dynamic programming problems that are amenable to relaxation via decomposition are considered, namely, Lagrangian relaxation and the linear programming (LP) approach to approximate dynamic programming.

...read moreread less

Abstract: We consider a broad class of stochastic dynamic programming problems that are amenable to relaxation via decomposition. These problems comprise multiple subproblems that are independent of each other except for a collection of coupling constraints on the action space. We fit an additively separable value function approximation using two techniques, namely, Lagrangian relaxation and the linear programming (LP) approach to approximate dynamic programming. We prove various results comparing the relaxations to each other and to the optimal problem value. We also provide a column generation algorithm for solving the LP-based relaxation to any desired optimality tolerance, and we report on numerical experiments on bandit-like problems. Our results provide insight into the complexity versus quality trade-off when choosing which of these relaxations to implement.

...read moreread less

187 citations

Journal Article•DOI•

Neurodynamic Programming and Zero-Sum Games for Constrained Control Systems

[...]

Murad Abu-Khalaf¹, Frank L. Lewis², Jie Huang³•Institutions (3)

MathWorks¹, University of Texas at Arlington², The Chinese University of Hong Kong³

01 Jul 2008-IEEE Transactions on Neural Networks

TL;DR: In this paper, neural networks are used along with two-player policy iterations to solve for the feedback strategies of a continuous-time zero-sum game that appears in L2-gain optimal control, suboptimal Hinfin control, of nonlinear systems affine in input with the control policy having saturation constraints.

...read moreread less

Abstract: In this paper, neural networks are used along with two-player policy iterations to solve for the feedback strategies of a continuous-time zero-sum game that appears in L2-gain optimal control, suboptimal Hinfin control, of nonlinear systems affine in input with the control policy having saturation constraints. The result is a closed-form representation, on a prescribed compact set chosen a priori, of the feedback strategies and the value function that solves the associated Hamilton-Jacobi-Isaacs (HJI) equation. The closed-loop stability, L2-gain disturbance attenuation of the neural network saturated control feedback strategy, and uniform convergence results are proven. Finally, this approach is applied to the rotational/translational actuator (RTAC) nonlinear benchmark problem under actuator saturation, offering guaranteed stability and disturbance attenuation.

...read moreread less

173 citations

Journal Article•DOI•

Stochastic optimal control of DC pension funds

[...]

Jianwei Gao¹•Institutions (1)

North China Electric Power University¹

01 Jun 2008-Insurance Mathematics & Economics

TL;DR: In this paper, a portfolio problem of a pension fund manager who wants to maximize the expected utility of the terminal wealth in a complete financial market with the stochastic interest rate is studied.

...read moreread less

Abstract: In this paper, we study the portfolio problem of a pension fund manager who wants to maximize the expected utility of the terminal wealth in a complete financial market with the stochastic interest rate. Using the method of stochastic optimal control, we derive a non-linear second-order partial differential equation for the value function. As it is difficult to find a closed form solution, we transform the primary problem into a dual one by applying a Legendre transform and dual theory, and try to find an explicit solution for the optimal investment strategy under the logarithm utility function. Finally, a numerical simulation is presented to characterize the dynamic behavior of the optimal portfolio strategy.

...read moreread less

116 citations

Journal Article•DOI•

Dirichlet Problems for some Hamilton-Jacobi Equations with Inequality Constraints

[...]

Jean-Pierre Aubin, Alexandre M. Bayen¹, Patrick Saint-Pierre²•Institutions (2)

University of California, Berkeley¹, Paris Dauphine University²

01 Jul 2008-Siam Journal on Control and Optimization

TL;DR: From the tangential condition characterizing capture basins, it is proved that this solution is the unique “upper semicontinuous” solution to the Hamilton-Jacobi-Bellman partial differential equation in the Barron-Jensen/Frankowska sense.

...read moreread less

Abstract: We use viability techniques for solving Dirichlet problems with inequality constraints (obstacles) for a class of Hamilton-Jacobi equations. The hypograph of the “solution” is defined as the “capture basin” under an auxiliary control system of a target associated with the initial and boundary conditions, viable in an environment associated with the inequality constraint. From the tangential condition characterizing capture basins, we prove that this solution is the unique “upper semicontinuous” solution to the Hamilton-Jacobi-Bellman partial differential equation in the Barron-Jensen/Frankowska sense. We show how this framework allows us to translate properties of capture basins into corresponding properties of the solutions to this problem. For instance, this approach provides a representation formula of the solution which boils down to the Lax-Hopf formula in the absence of constraints.

...read moreread less

94 citations

Journal Article•DOI•

Efficient On-Line Computation of Constrained Optimal Control

[...]

Mato Baotić, Francesco Borrelli, Alberto Bemporad, Manfred Morari¹•Institutions (1)

ETH Zurich¹

01 Jul 2008-Siam Journal on Control and Optimization

TL;DR: In this article, the authors consider constrained finite-time optimal control problems for discrete-time linear time-invariant systems with constraints on inputs and outputs based on linear and quadratic performance indices.

...read moreread less

Abstract: We consider constrained finite-time optimal control problems for discrete-time linear time-invariant systems with constraints on inputs and outputs based on linear and quadratic performance indices. The solution to such problems is a time-varying piecewise affine (PWA) state-feedback law and can be computed by means of multiparametric programming. By exploiting the properties of the value function and the piecewise affine optimal control law of the constrained finite-time optimal control (CFTOC), we propose two new algorithms that avoid storing the polyhedral regions. The new algorithms significantly reduce the on-line storage demands and computational complexity during evaluation of the PWA feedback control law resulting from the CFTOC.

...read moreread less

91 citations

Journal Article•DOI•

Hamilton–Jacobi–Bellman Equations and Approximate Dynamic Programming on Time Scales

[...]

John Seiffertt¹, Suman Sanyal¹, Donald C. Wunsch¹•Institutions (1)

Missouri University of Science and Technology¹

01 Aug 2008

TL;DR: In this article, the core backward induction algorithm of dynamic programming is extended from its traditional discrete case to all isolated time scales and the Hamilton-Jacobi-Bellman equations are motivated and proven on time scales.

...read moreread less

Abstract: The time scales calculus is a key emerging area of mathematics due to its potential use in a wide variety of multidisciplinary applications. We extend this calculus to approximate dynamic programming (ADP). The core backward induction algorithm of dynamic programming is extended from its traditional discrete case to all isolated time scales. Hamilton-Jacobi-Bellman equations, the solution of which is the fundamental problem in the field of dynamic programming, are motivated and proven on time scales. By drawing together the calculus of time scales and the applied area of stochastic control via ADP, we have connected two major fields of research.

...read moreread less

Journal Article•DOI•

Optimal Transportation Problem by Stochastic Optimal Control

[...]

Toshio Mikami, Michèle Thieullen

15 Mar 2008-Siam Journal on Control and Optimization

TL;DR: In this article, the authors consider a stochastic control problem which is a natural extension of the Monge-Kantorovich problem and provide a probabilistic proof of two fundamental results in mass transportation: the Kantorovich duality and the graph property.

...read moreread less

Abstract: We address an optimal mass transportation problem by means of optimal stochastic control. We consider a stochastic control problem which is a natural extension of the Monge-Kantorovich problem. Using a vanishing viscosity argument we provide a probabilistic proof of two fundamental results in mass transportation: the Kantorovich duality and the graph property for the support of an optimal measure for the Monge-Kantorovich problem. Our key tool is a stochastic duality result involving solutions of the Hamilton-Jacobi-Bellman PDE.

...read moreread less

Posted Content•

Monge--Amp\`ere equation and Bellman optimization of Carleson Embedding Theorems

[...]

Vasily Vasyunin, Alexander Volberg

14 Mar 2008-arXiv: Analysis of PDEs

TL;DR: This work explores the way of solving Monge--Amp\`ere equation by a sort of method of characteristics to find the Bellman function of certain classical Harmonic Analysis problems, and, therefore, of finding full structure of sharp constants and extremal sequences for those problems.

...read moreread less

Abstract: Monge--Amp\`ere equation plays an important part in Analysis. For example, it is instrumental in mass transport problems. On the other hand, the Bellman function technique appeared recently as a way to consider certain Harmonic Analysis problems as the problems of Stochastic Optimal Control. This brings us to Bellman PDE, which in stochastic setting is often a Monge--Amp\`ere equation or its close relative. We explore the way of solving Monge--Amp\`ere equation by a sort of method of characteristics to find the Bellman function of certain classical Harmonic Analysis problems, and, therefore, of finding full structure of sharp constants and extremal sequences for those problems.

...read moreread less

Posted Content•

A Multiperiod Newsvendor Problem with Partially Observed Demand

[...]

Alain Bensoussan¹, Metin Çakanyildirim¹, Suresh Sethi¹•Institutions (1)

University of Texas at Dallas¹

04 Feb 2008-Social Science Research Network

TL;DR: In a newsvendor problem with partially observed Markovian demand, the optimal order is set to exceed the myopic optimal order, and a near-optimal solution is characterized by establishing that the value function is piecewise linear.

...read moreread less

Abstract: We consider a newsvendor problem with partially observed Markovian demand. Demand is observed if it is less than the inventory. Otherwise, only the event that it is larger than or equal to the inventory is observed. These observations are used to update the demand distribution from one period to the next. The state of the resulting dynamic programming equation is the current demand distribution, which is generally infinite dimensional. We use unnormalized probabilities to convert the nonlinear state transition equation to a linear one. This helps in proving the existence of an optimal feedback ordering policy. So as to learn more about the demand, the optimal order is set to exceed the myopic optimal order. The optimal cost decreases as the demand distribution decreases in the hazard rate order. In a special case with finitely many demand values, we characterize a near-optimal solution by establishing that the value function is piecewise linear.

...read moreread less

Journal Article•DOI•

Generalized semi-infinite programming: A tutorial

[...]

F. Guerra Vázquez¹, Jan-J. Rückmann², Oliver Stein³, Georg Still⁴•Institutions (4)

Universidad de las Américas Puebla¹, University of Birmingham², RWTH Aachen University³, University of Twente⁴

20 Jul 2008-Journal of Computational and Applied Mathematics

TL;DR: In this article, the authors present an introduction to generalized semi-infinite programming (GSIP) models and present necessary and sufficient first-and second-order optimality conditions where directional differentiability properties of the optimal value function of the lower level problem are used.

...read moreread less

Journal Article•DOI•

Singular Trajectories of Control-Affine Systems

[...]

Yacine Chitour, Frédéric Jean, Emmanuel Trélat

01 Mar 2008-Siam Journal on Control and Optimization

TL;DR: It is proved that, under generic assumptions, such trajectories of control-affine systems satisfying the Lie algebra rank condition (LARC), singular trajectories are strictly abnormal, generically with respect to the cost, and it is shown how these results can be used to derive regularity results for the value function and in the theory of Hamilton-Jacobi equations.

...read moreread less

Abstract: When applying methods of optimal control to motion planning or stabilization problems, we see that some theoretical or numerical difficulties may arise, due to the presence of specific trajectories, namely, minimizing singular trajectories of the underlying optimal control problem. In this article, we provide characterizations for singular trajectories of control-affine systems. We prove that, under generic assumptions, such trajectories share nice properties, related to computational aspects; more precisely, we show that, for a generic system—with respect to the Whitney topology—all nontrivial singular trajectories are of minimal order and of corank one. These results, established both for driftless and for control-affine systems, extend results of [Y. Chitour, F. Jean, and E. Trelat, Comptes Rendus Math., 337 (2003), pp. 49-52 (in French); Y. Chitour, F. Jean, and E. Trelat, J. Differential Geom., 73 (2006), pp. 45-73]. As a consequence, for generic control-affine systems (with or without drift) defined by more than two vector fields, and for a fixed cost, there do not exist minimizing singular trajectories. Besides, we prove that, given a control-affine system satisfying the Lie algebra rank condition (LARC), singular trajectories are strictly abnormal, generically with respect to the cost. We then show how these results can be used to derive regularity results for the value function and in the theory of Hamilton-Jacobi equations, which in turn have applications for stabilization and motion planning, from both theoretical and implementational points of view.

...read moreread less

Proceedings Article•

Bounding Performance Loss in Approximate MDP Homomorphisms

[...]

Jonathan Taylor¹, Doina Precup², Prakash Panagaden²•Institutions (2)

University of Toronto¹, McGill University²

08 Dec 2008

TL;DR: A metric for measuring behavior similarity between states in a Markov decision process (MDP), which takes action similarity into account, is defined and it is proved that the difference in the optimal value function of different states can be upper-bounded by the value of this metric.

...read moreread less

Abstract: We define a metric for measuring behavior similarity between states in a Markov decision process (MDP), which takes action similarity into account. We show that the kernel of our metric corresponds exactly to the classes of states defined by MDP homomorphisms (Ravindran & Barto, 2003). We prove that the difference in the optimal value function of different states can be upper-bounded by the value of this metric, and that the bound is tighter than previous bounds provided by bisimulation metrics (Ferns et al. 2004, 2005). Our results hold both for discrete and for continuous actions. We provide an algorithm for constructing approximate homomorphisms, by using this metric to identify states that can be grouped together, as well as actions that can be matched. Previous research on this topic is based mainly on heuristics.

...read moreread less

Journal Article•DOI•

Solving optimal growth models with vintage capital: The dynamic programming approach

[...]

Giorgio Fabbri¹, Fausto Gozzi¹•Institutions (1)

Libera Università Internazionale degli Studi Sociali Guido Carli¹

01 Nov 2008-Journal of Economic Theory

TL;DR: In this article, the authors deal with an endogenous growth model with vintage capital and more precisely with the AK model proposed in [R. Boucekkine, O. Licandro, L. Puch, F. del Rio, and L.A.

...read moreread less

Journal Article•DOI•

Dynamic Programming Principle for One Kind of Stochastic Recursive Optimal Control Problem and Hamilton-Jacobi-Bellman Equation

[...]

Zhen Wu, Zhiyong Yu

01 Jul 2008-Siam Journal on Control and Optimization

TL;DR: The dynamic programming principle is given for this kind of optimal control problem and it is shown that the value function is the unique viscosity solution of the obstacle problem for the corresponding Hamilton-Jacobi-Bellman equation.

...read moreread less

Abstract: In this paper, we study one kind of stochastic recursive optimal control problem with the obstacle constraint for the cost functional described by the solution of a reflected backward stochastic differential equation. We give the dynamic programming principle for this kind of optimal control problem and show that the value function is the unique viscosity solution of the obstacle problem for the corresponding Hamilton-Jacobi-Bellman equation.

...read moreread less

Journal Article•DOI•

Semiconcavity results for optimal control problems admitting no singular minimizing controls

[...]

P. Cannarsa¹, Ludovic Rifford²•Institutions (2)

University of Rome Tor Vergata¹, University of Nice Sophia Antipolis²

01 Jul 2008-Annales De L Institut Henri Poincare-analyse Non Lineaire

TL;DR: In this article, the authors prove the semiconcavity of the value function of an optimal control problem with end-point constraints for which all minimizing controls are supposed to be nonsingular.

...read moreread less

Abstract: Semiconcavity results have generally been obtained for optimal control problems in absence of state constraints. In this paper, we prove the semiconcavity of the value function of an optimal control problem with end-point constraints for which all minimizing controls are supposed to be nonsingular.

...read moreread less

Proceedings Article•

Piecewise linear dynamic programming for constrained POMDPs

[...]

Joshua D. Isom¹, Sean P. Meyn², Richard D. Braatz²•Institutions (2)

Sikorsky Aircraft¹, University of Illinois at Urbana–Champaign²

13 Jul 2008

TL;DR: An exact dynamic programming update for constrained partially observable Markov decision processes (CPOMDPs) relies on implicit enumeration of the vectors in the piecewise linear value function, and pruning operations to obtain a minimal representation of the updated value function.

...read moreread less

Abstract: We describe an exact dynamic programming update for constrained partially observable Markov decision processes (CPOMDPs). State-of-the-art exact solution of unconstrained POMDPs relies on implicit enumeration of the vectors in the piecewise linear value function, and pruning operations to obtain a minimal representation of the updated value function. In dynamic programming for CPOMDPs, each vector takes two valuations, one with respect to the objective function and another with respect to the constraint function. The dynamic programming update consists of finding, for each belief state, the vector that has the best objective function valuation while still satisfying the constraint function. Whereas the pruning operation in an unconstrained POMDP requires solution of a linear program, the pruning operation for CPOMDPs requires solution of a mixed integer linear program.

...read moreread less

Journal Article•DOI•

Minimal Time Sequential Batch Reactors with Bounded and Impulse Controls for One or More Species

[...]

Pedro Gajardo, Alain Rapaport¹•Institutions (1)

Arts et Métiers ParisTech¹

01 Nov 2008-Siam Journal on Control and Optimization

TL;DR: The possibility for the immediate one-impulse strategy to be nonoptimal while both growth functions are monotonic is a surprising result and is illustrated with the help of numerical simulations.

...read moreread less

Abstract: We consider the optimal control problem of feeding in minimal time a tank where several species compete for a single resource, with the objective being to reach a given level of the resource. We allow controls to be bounded measurable functions of time plus possible impulses. For the one-species case, we show that the immediate one-impulse strategy (filling the whole reactor with one single impulse at the initial time) is optimal when the growth function is monotonic. For nonmonotonic growth functions with one maximum, we show that a particular singular arc strategy (precisely defined in section 3) is optimal. These results extend and improve former ones obtained for the class of measurable controls only. For the two-species case with monotonic growth functions, we give conditions under which the immediate one-impulse strategy is optimal. We also give optimality conditions for the singular arc strategy (at a level that depends on the initial condition) to be optimal. The possibility for the immediate one-impulse strategy to be nonoptimal while both growth functions are monotonic is a surprising result and is illustrated with the help of numerical simulations.

...read moreread less

Posted Content•

Utility Maximization in a jump market model

[...]

Marie-Amelie Morlais¹•Institutions (1)

ETH Zurich¹

01 May 2008-Research Papers in Economics

TL;DR: To solve the financial problem of utility maximization in a financial market allowing jumps, this paper first proves existence and uniqueness results for the introduced BSDE, which allows the expression of the value function and characterize optimal strategies for the problem.

...read moreread less

Abstract: In this paper, we consider the classical problem of utility maximization in a financial market allowing jumps. Assuming that the constraint set is a compact set, rather than a convex one, we use a dynamic method from which we derive a specific BSDE. We then aim at showing existence and uniqueness results for the introduced BSDE. This allows us to give an explicit expression of the value function and characterize optimal strategies for our problem.

...read moreread less

Journal Article•DOI•

Random Sampling of States in Dynamic Programming

[...]

Christopher G. Atkeson¹, Benjamin Stephens¹•Institutions (1)

Carnegie Mellon University¹

01 Aug 2008

TL;DR: This paper combines three threads of research on approximate dynamic programming: sparse random sampling of states, value function and policy approximation using local models, and using local trajectory optimizers to globally optimize a policy and associated value function.

...read moreread less

Abstract: We combine three threads of research on approximate dynamic programming: sparse random sampling of states, value function and policy approximation using local models, and using local trajectory optimizers to globally optimize a policy and associated value function. Our focus is on finding steady-state policies for deterministic time-invariant discrete time control problems with continuous states and actions often found in robotics. In this paper, we describe our approach and provide initial results on several simulated robotics problems.

...read moreread less

Journal Article•DOI•

Adjustable Robust Optimization Models for a Nonlinear Two-Period System

[...]

Akiko Takeda¹, S. Taguchi², Reha Tütüncü³•Institutions (3)

Tokyo Institute of Technology¹, Toshiba², Goldman Sachs³

01 Feb 2008-Journal of Optimization Theory and Applications

TL;DR: In this paper, the authors studied two-period nonlinear optimization problems whose parameters are uncertain and showed that quasiconvexity of the optimal value function of certain subproblems is sufficient for reducibility of the resulting robust optimization problem to a single-level deterministic problem.

...read moreread less

Abstract: We study two-period nonlinear optimization problems whose parameters are uncertain. We assume that uncertain parameters are revealed in stages and model them using the adjustable robust optimization approach. For problems with polytopic uncertainty, we show that quasiconvexity of the optimal value function of certain subproblems is sufficient for the reducibility of the resulting robust optimization problem to a single-level deterministic problem. We relate this sufficient condition to the cone-quasiconvexity of the feasible set mapping for adjustable variables and present several examples and applications satisfying these conditions.

...read moreread less

Journal Article•DOI•

Polynomial-Time Algorithms for Stochastic Uncapacitated Lot-Sizing Problems

[...]

Yongpei Guan, Andrew J. Miller

31 Mar 2008-Operations Research

TL;DR: This paper defines the production-path property of an optimal solution for their model and uses this property to develop a backward dynamic programming recursion, which allows for a full characterization of the optimal value function to be obtained by a dynamic programming algorithm in polynomial time.

...read moreread less

Abstract: In 1958, Wagner and Whitin published a seminal paper on the deterministic uncapacitated lot-sizing problem, a fundamental model that is embedded in many practical production planning problems. In this paper, we consider a basic version of this model in which problem parameters are stochastic: the stochastic uncapacitated lot-sizing problem. We define the production-path property of an optimal solution for our model and use this property to develop a backward dynamic programming recursion. This approach allows us to show that the value function is piecewise linear and right continuous. We then use these results to show that a full characterization of the optimal value function can be obtained by a dynamic programming algorithm in polynomial time for the case that each nonleaf node contains at least two children. Moreover, we show that our approach leads to a polynomial-time algorithm to obtain an optimal solution to any instance of the stochastic uncapacitated lot-sizing problem, regardless of the structur...

...read moreread less

Proceedings Article•DOI•

A Hybrid Differential Dynamic Programming Algorithm for Robust Low-Thrust Optimization

[...]

Gregory Lantoine¹, Ryan P. Russell•Institutions (1)

Georgia Institute of Technology¹

18 Aug 2008

TL;DR: This paper builds upon previous and existing optimization strategies to present an alternative hybrid variant of differential dynamic programming for robust low-thrust optimization that uses first- and second-order state transition matrices to take advantage of an efficient discretization scheme and obtain the partial derivatives needed to perform the minimization.

...read moreread less

Abstract: Low-thrust propulsion is becoming increasingly considered for future space missions, but optimization of the resulting trajectories is very challenging. To solve such complex problems, differential dynamic programming is a proven technique based on Bellman’s Principle of Optimality and successive minimization of quadratic approximations. In this paper, we build upon previous and existing optimization strategies to present an alternative hybrid variant of differential dynamic programming for robust low-thrust optimization. It uses first- and second-order state transition matrices to take advantage of an efficient discretization scheme and obtain the partial derivatives needed to perform the minimization. Unlike the traditional formulation, the state transition approach provides valuable constraint sensitivities and furthermore is naturally amenable to parallel computation. The method includes also a smoothing strategy to improve robustness of convergence when starting far from the optimum, as well as the capability to handle efficiently both soft and hard constraints. Procedures to drastically reduce the computation cost are mentioned. Preliminary numerical results are presented and compared to existing algorithms to illustrate the performance and the accuracy of our approach.

...read moreread less

Journal Article•DOI•

Multi-Robot Searching using Game-Theory Based Approach

[...]

Yan Meng¹•Institutions (1)

Stevens Institute of Technology¹

01 Nov 2008-International Journal of Advanced Robotic Systems

TL;DR: A game-theory based approach in a multi–target searching using a multi-robot system in a dynamic environment with main advantage in its real-time capabilities whilst being efficient and robust to dynamic environments.

...read moreread less

Abstract: This paper proposes a game-theory based approach in a multi-target searching using a multi-robot system in a dynamic environment. It is assumed that a rough priori probability map of the targets' distribution within the environment is given. To consider the interaction between the robots, a dynamic-programming equation is proposed to estimate the utility function for each robot. Based on this utility function, a cooperative nonzero-sum game is generated, where both pure Nash Equilibrium and mixed-strategy Equilibrium solutions are presented to achieve an optimal overall robot behaviors. A special consideration has been taken to improve the real-time performance of the game-theory based approach. Several mechanisms, such as event-driven discretization, one-step dynamic programming, and decision buffer, have been proposed to reduce the computational complexity. The main advantage of the algorithm lies in its real-time capabilities whilst being efficient and robust to dynamic environments.

...read moreread less

Proceedings Article•DOI•

Nonlinear optimal control synthesis via occupation measures

[...]

Didier Henrion¹, Jean B. Lasserre¹, Carlo Savorgnan²•Institutions (2)

University of Toulouse¹, Katholieke Universiteit Leuven²

09 Dec 2008

TL;DR: In this paper, occupation measures are used to approximate pointwise the optimal value function of a given OCP, using a hierarchy of linear matrix inequality (LMI) relaxations, and an almost optimal control law is derived.

...read moreread less

Abstract: We consider nonlinear optimal control problems (OCPs) for which all problem data are polynomial. In the first part of the paper, we review how occupation measures can be used to approximate pointwise the optimal value function of a given OCP, using a hierarchy of linear matrix inequality (LMI) relaxations. In the second part, we extend the methodology to approximate the optimal value function on a given set and we use such a function to constructively and computationally derive an almost optimal control law. Numerical examples show the effectiveness of the approach.

...read moreread less

Journal Article•DOI•

Approximate Dynamic Programming Applied to Parallel Hybrid Powertrains

[...]

Lars Johannesson¹, Bo Egardt¹•Institutions (1)

Chalmers University of Technology¹

01 Jan 2008-IFAC Proceedings Volumes

TL;DR: An Approximate Dynamic Programming scheme that efficiently solves the optimal power split between the internal combustion engine and the electric machine in parallel hybrid powertrains is presented.

...read moreread less

Collapse