Showing papers on "Bellman equation published in 2014"

PDF

Open Access

Journal Article•DOI•

Integral reinforcement learning and experience replay for adaptive optimal control of partially-unknown constrained-input continuous-time systems

[...]

Hamidreza Modares¹, Frank L. Lewis², Mohammad-Bagher Naghibi-Sistani¹•Institutions (2)

Ferdowsi University of Mashhad¹, University of Texas at Arlington²

01 Jan 2014-Automatica

TL;DR: An integral reinforcement learning algorithm on an actor-critic structure is developed to learn online the solution to the Hamilton-Jacobi-Bellman equation for partially-unknown constrained-input systems and it is shown that using this technique, an easy-to-check condition on the richness of the recorded data is sufficient to guarantee convergence to a near-optimal control law.

...read moreread less

410 citations

Journal Article•DOI•

Reinforcement Q-learning for optimal tracking control of linear discrete-time systems with unknown dynamics

[...]

Bahare Kiumarsi¹, Frank L. Lewis², Hamidreza Modares¹, Ali Karimpour¹, Mohammad Bagher Naghibi-Sistani¹ - Show less +1 more•Institutions (2)

Ferdowsi University of Mashhad¹, University of Texas at Arlington²

01 Apr 2014-Automatica

TL;DR: A novel approach based on the Q -learning algorithm is proposed to solve the infinite-horizon linear quadratic tracker (LQT) for unknown discrete-time systems in a causal manner and the optimal control input is obtained by only solving an augmented ARE.

...read moreread less

397 citations

Journal Article•DOI•

Linear Quadratic Tracking Control of Partially-Unknown Continuous-Time Systems Using Reinforcement Learning

[...]

Hamidreza Modares¹, Frank L. Lewis¹•Institutions (1)

University of Texas at Austin¹

11 Apr 2014-IEEE Transactions on Automatic Control

TL;DR: An online learning algorithm is developed to solve the linear quadratic tracking (LQT) problem for partially-unknown continuous-time systems and it is shown that the value function is Quadratic in terms of the state of the system and the command generator.

...read moreread less

Abstract: In this technical note, an online learning algorithm is developed to solve the linear quadratic tracking (LQT) problem for partially-unknown continuous-time systems. It is shown that the value function is quadratic in terms of the state of the system and the command generator. Based on this quadratic form, an LQT Bellman equation and an LQT algebraic Riccati equation (ARE) are derived to solve the LQT problem. The integral reinforcement learning technique is used to find the solution to the LQT ARE online and without requiring the knowledge of the system drift dynamics or the command generator dynamics. The convergence of the proposed online algorithm to the optimal control solution is verified. To show the efficiency of the proposed approach, a simulation example is provided.

...read moreread less

320 citations

Journal Article•DOI•

Online adaptive policy learning algorithm for H∞ state feedback control of unknown affine nonlinear discrete-time systems.

[...]

Huaguang Zhang¹, Chunbin Qin², Bin Jiang³, Yanhong Luo¹•Institutions (3)

Northeastern University (China)¹, Henan University², Nanjing University of Aeronautics and Astronautics³

28 Jul 2014-IEEE Transactions on Systems, Man, and Cybernetics

TL;DR: An online adaptive policy learning algorithm (APLA) based on adaptive dynamic programming (ADP) is proposed for learning in real-time the solution to the Hamilton-Jacobi-Isaacs (HJI) equation, which appears in the H∞ control problem.

...read moreread less

Abstract: The problem of H∞ state feedback control of affine nonlinear discrete-time systems with unknown dynamics is investigated in this paper. An online adaptive policy learning algorithm (APLA) based on adaptive dynamic programming (ADP) is proposed for learning in real-time the solution to the Hamilton-Jacobi-Isaacs (HJI) equation, which appears in the H∞ control problem. In the proposed algorithm, three neural networks (NNs) are utilized to find suitable approximations of the optimal value function and the saddle point feedback control and disturbance policies. Novel weight updating laws are given to tune the critic, actor, and disturbance NNs simultaneously by using data generated in real-time along the system trajectories. Considering NN approximation errors, we provide the stability analysis of the proposed algorithm with Lyapunov approach. Moreover, the need of the system input dynamics for the proposed algorithm is relaxed by using a NN identification scheme. Finally, simulation examples show the effectiveness of the proposed algorithm.

...read moreread less

197 citations

Journal Article•DOI•

A theory of Markovian time-inconsistent stochastic control in discrete time

[...]

Tomas Björk¹, Agatha Murgoci²•Institutions (2)

Stockholm School of Economics¹, Copenhagen Business School²

13 Jun 2014-Finance and Stochastics

TL;DR: A theory for a general class of discrete-time stochastic control problems that, in various ways, are time-inconsistent in the sense that they do not admit a Bellman optimality principle is developed.

...read moreread less

Abstract: We develop a theory for a general class of discrete-time stochastic control problems that, in various ways, are time-inconsistent in the sense that they do not admit a Bellman optimality principle. We attack these problems by viewing them within a game theoretic framework, and we look for subgame perfect Nash equilibrium points. For a general controlled Markov process and a fairly general objective functional, we derive an extension of the standard Bellman equation, in the form of a system of nonlinear equations, for the determination of the equilibrium strategy as well as the equilibrium value function. Most known examples of time-inconsistent stochastic control problems in the literature are easily seen to be special cases of the present theory. We also prove that for every time-inconsistent problem, there exists an associated time-consistent problem such that the optimal control and the optimal value function for the consistent problem coincide with the equilibrium control and value function, respectively for the time-inconsistent problem. To exemplify the theory, we study some concrete examples, such as hyperbolic discounting and mean–variance control.

...read moreread less

188 citations

Journal Article•DOI•

Integral Reinforcement Learning for Linear Continuous-Time Zero-Sum Games With Completely Unknown Dynamics

[...]

Hongliang Li¹, Derong Liu¹, Ding Wang¹•Institutions (1)

Chinese Academy of Sciences¹

31 Jan 2014-IEEE Transactions on Automation Science and Engineering

TL;DR: An integral reinforcement learning algorithm based on policy iteration to learn online the Nash equilibrium solution for a two-player zero-sum differential game with completely unknown linear continuous-time dynamics is developed.

...read moreread less

Abstract: In this paper, we develop an integral reinforcement learning algorithm based on policy iteration to learn online the Nash equilibrium solution for a two-player zero-sum differential game with completely unknown linear continuous-time dynamics. This algorithm is a fully model-free method solving the game algebraic Riccati equation forward in time. The developed algorithm updates value function, control and disturbance policies simultaneously. The convergence of the algorithm is demonstrated to be equivalent to Newton's method. To implement this algorithm, one critic network and two action networks are used to approximate the game value function, control and disturbance policies, respectively, and the least squares method is used to estimate the unknown parameters. The effectiveness of the developed scheme is demonstrated in the simulation by designing an H-infinity state feedback controller for a power system.

...read moreread less

149 citations

Journal Article•DOI•

Risk-Sensitive Mean-Field Games

[...]

Hamidou Tembine¹, Quanyan Zhu², Tamer Basar²•Institutions (2)

King Abdullah University of Science and Technology¹, University of Illinois at Urbana–Champaign²

01 Apr 2014-IEEE Transactions on Automatic Control

TL;DR: In this paper, a class of risk-sensitive mean-field stochastic differential games with exponential cost functions is studied and the corresponding mean field equilibria are characterized in terms of backward-forward macroscopic McKean-Vlasov equations, Fokker-Planck-Kolmogorov equations and HJB equations.

...read moreread less

Abstract: In this paper, we study a class of risk-sensitive mean-field stochastic differential games. We show that under appropriate regularity conditions, the mean-field value of the stochastic differential game with exponentiated integral cost functional coincides with the value function satisfying a Hamilton -Jacobi- Bellman (HJB) equation with an additional quadratic term. We provide an explicit solution of the mean-field best response when the instantaneous cost functions are log-quadratic and the state dynamics are affine in the control. An equivalent mean-field risk-neutral problem is formulated and the corresponding mean-field equilibria are characterized in terms of backward-forward macroscopic McKean-Vlasov equations, Fokker-Planck-Kolmogorov equations, and HJB equations. We provide numerical examples on the mean field behavior to illustrate both linear and McKean-Vlasov dynamics.

...read moreread less

132 citations

Journal Article•DOI•

An exponential turnpike theorem for dissipative discrete time optimal control problems

[...]

Tobias Damm, Lars Grüne, Marleen Stieler, Karl Worthmann

12 Jun 2014-Siam Journal on Control and Optimization

TL;DR: Two theorems illustrate how this boundedness condition can be concluded from structural properties like controllability and stabilizability of the control system under consideration of the class of strictly dissipative systems under consideration.

...read moreread less

Abstract: We investigate the exponential turnpike property for finite horizon undiscounted discrete time optimal control problems without any terminal constraints. Considering a class of strictly dissipative systems, we derive a boundedness condition for an auxiliary optimal value function which implies the exponential turnpike property. Two theorems illustrate how this boundedness condition can be concluded from structural properties like controllability and stabilizability of the control system under consideration.

...read moreread less

116 citations

Journal Article•DOI•

Revisiting approximate dynamic programming and its convergence.

[...]

Ali Heydari¹•Institutions (1)

South Dakota School of Mines and Technology¹

16 May 2014-IEEE Transactions on Systems, Man, and Cybernetics

TL;DR: Value iteration-based approximate/adaptive dynamic programming (ADP) as an approximate solution to infinite-horizon optimal control problems with deterministic dynamics and continuous state and action spaces is investigated and a relatively simple proof for the convergence of the outer-loop iterations to the optimal solution is provided.

...read moreread less

Abstract: Value iteration-based approximate/adaptive dynamic programming (ADP) as an approximate solution to infinite-horizon optimal control problems with deterministic dynamics and continuous state and action spaces is investigated. The learning iterations are decomposed into an outer loop and an inner loop. A relatively simple proof for the convergence of the outer-loop iterations to the optimal solution is provided using a novel idea with some new features. It presents an analogy between the value function during the iterations and the value function of a fixed-final-time optimal control problem. The inner loop is utilized to avoid the need for solving a set of nonlinear equations or a nonlinear optimization problem numerically, at each iteration of ADP for the policy update. Sufficient conditions for the uniqueness of the solution to the policy update equation and for the convergence of the inner-loop iterations to the solution are obtained. Afterwards, the results are formed as a learning algorithm for training a neurocontroller or creating a look-up table to be used for optimal control of nonlinear systems with different initial conditions. Finally, some of the features of the investigated method are numerically analyzed.

...read moreread less

114 citations

Journal Article•DOI•

Discontinuous Galerkin Finite Element Approximation of Hamilton--Jacobi--Bellman Equations with Cordes Coefficients

[...]

Iain Smears¹, Endre Süli¹•Institutions (1)

University of Oxford¹

24 Apr 2014-SIAM Journal on Numerical Analysis

TL;DR: The method is proved to be consistent and stable, with convergence rates that are optimal with respect to mesh size, and suboptimal in the polynomial degree by only half an order.

...read moreread less

Abstract: We propose an $hp$-version discontinuous Galerkin finite element method for fully nonlinear second-order elliptic Hamilton--Jacobi--Bellman equations with Cordes coefficients. The method is proved to be consistent and stable, with convergence rates that are optimal with respect to mesh size, and suboptimal in the polynomial degree by only half an order. Numerical experiments on problems with nonsmooth solutions and strongly anisotropic diffusion coefficients illustrate the accuracy and computational efficiency of the scheme. An existence and uniqueness result for strong solutions of the fully nonlinear problem and a semismoothness result for the nonlinear operator are also provided.

...read moreread less

100 citations

Proceedings Article•

Probabilistic Differential Dynamic Programming

[...]

Yunpeng Pan¹, Evangelos A. Theodorou¹•Institutions (1)

Georgia Institute of Technology¹

08 Dec 2014

TL;DR: Compared with the classical DDP and a state-of-the-art GP-based policy search method, PDDP offers a superior combination of data-efficiency, learning speed, and applicability.

...read moreread less

Abstract: We present a data-driven, probabilistic trajectory optimization framework for systems with unknown dynamics, called Probabilistic Differential Dynamic Programming (PDDP). PDDP takes into account uncertainty explicitly for dynamics models using Gaussian processes (GPs). Based on the second-order local approximation of the value function, PDDP performs Dynamic Programming around a nominal trajectory in Gaussian belief spaces. Different from typical gradient-based policy search methods, PDDP does not require a policy parameterization and learns a locally optimal, time-varying control policy. We demonstrate the effectiveness and efficiency of the proposed algorithm using two nontrivial tasks. Compared with the classical DDP and a state-of-the-art GP-based policy search method, PDDP offers a superior combination of data-efficiency, learning speed, and applicability.

...read moreread less

Journal Article•DOI•

On the Planning Problem for the Mean Field Games System

[...]

Alessio Porretta¹•Institutions (1)

University of Rome Tor Vergata¹

01 Jun 2014-Dynamic Games and Applications

TL;DR: It is proved the existence of a weak solution to the system with prescribed initial and terminal conditions m0, m1 (positive and smooth) for the density m, which is also a special case of an exact controllability result for the Fokker–Planck equation through some optimal transport field.

...read moreread less

Abstract: We consider the planning problem for a class of mean field games, consisting in a coupled system of a Hamilton–Jacobi–Bellman equation for the value function u and a Fokker–Planck equation for the density m of the players, whereas one wishes to drive the density of players from the given initial configuration to a target one at time T through the optimal decisions of the agents. Assuming that the coupling F(x,m) in the cost criterion is monotone with respect to m, and that the Hamiltonian has some growth bounded below and above by quadratic functions, we prove the existence of a weak solution to the system with prescribed initial and terminal conditions m 0, m 1 (positive and smooth) for the density m. This is also a special case of an exact controllability result for the Fokker–Planck equation through some optimal transport field.

...read moreread less

Journal Article•DOI•

Necessary optimality conditions in pessimistic bilevel programming

[...]

Stephan Dempe¹, Boris S. Mordukhovich², Alain B. Zemkoho¹•Institutions (2)

Freiberg University of Mining and Technology¹, Wayne State University²

20 Mar 2014-Optimization

TL;DR: In this paper, the so-called pessimistic version of bilevel programming programs is studied and several types of lower subdifferential necessary optimality conditions are derived by using the lower-level value function approach and the Karush-Kuhn-Tucker representation of lower level optimal solution maps.

...read moreread less

Abstract: This article is devoted to the so-called pessimistic version of bilevel programming programs. Minimization problems of this type are challenging to handle partly because the corresponding value functions are often merely upper (while not lower) semicontinuous. Employing advanced tools of variational analysis and generalized differentiation, we provide rather general frameworks ensuring the Lipschitz continuity of the corresponding value functions. Several types of lower subdifferential necessary optimality conditions are then derived by using the lower-level value function approach and the Karush–Kuhn–Tucker representation of lower-level optimal solution maps. We also derive upper subdifferential necessary optimality conditions of a new type, which can be essentially stronger than the lower ones in some particular settings. Finally, certain links are established between the obtained necessary optimality conditions for the pessimistic and optimistic versions in bilevel programming.

...read moreread less

Journal Article•DOI•

Optimal reinsurance and investment with unobservable claim size and intensity

[...]

Zhibin Liang¹, Erhan Bayraktar²•Institutions (2)

Nanjing Normal University¹, University of Michigan²

01 Mar 2014-Insurance Mathematics & Economics

TL;DR: In this paper, the authors considered the optimal reinsurance and investment problem in an unobservable Markov-modulated compound Poisson risk model, where the intensity and jump size distribution are not known but have to be inferred from the observations of claim arrivals.

...read moreread less

Abstract: We consider the optimal reinsurance and investment problem in an unobservable Markov-modulated compound Poisson risk model, where the intensity and jump size distribution are not known but have to be inferred from the observations of claim arrivals. Using a recently developed result from filtering theory, we reduce the partially observable control problem to an equivalent problem with complete observations. Then using stochastic control theory, we get the closed form expressions of the optimal strategies which maximize the expected exponential utility of terminal wealth. In particular, we investigate the effect of the safety loading and the unobservable factors on the optimal reinsurance strategies. With the help of a generalized Hamilton–Jacobi–Bellman equation where the derivative is replaced by Clarke’s generalized gradient as in Bauerle and Rieder (2007), we characterize the value function, which helps us verify that the strategies we constructed are optimal.

...read moreread less

Journal Article•DOI•

A generalized endogenous grid method for non-smooth and non-concave problems

[...]

Giulio Fella¹•Institutions (1)

Queen Mary University of London¹

01 Apr 2014-Review of Economic Dynamics

TL;DR: This paper extends Carroll's endogenous grid method and its combination with value function iteration to a class of dynamic programming problems, such as problems with both discrete and continuous choices, in which the value function is non-smooth and non-concave.

...read moreread less

Journal Article•DOI•

Infinite time horizon maximum causal entropy inverse reinforcement learning

[...]

Zhengyuan Zhou¹, Michael Bloem², Nicholas Bambos¹•Institutions (2)

Stanford University¹, Steelcase²

01 Dec 2014

TL;DR: The maximum causal entropy framework is extended to the infinite time horizon setting and a gradient-based algorithm for the maximum discounted causal entropy formulation is developed that enjoys the desired feature of being model agnostic, a property that is absent in many previous IRL algorithms.

...read moreread less

Abstract: Inverse reinforcement learning (IRL) attempts to use demonstrations of “expert” decision making in a Markov decision process to infer a corresponding policy that shares the “structured, purposeful” qualities of the expert's actions. In this paper, we extend the maximum causal entropy framework, a notable paradigm in IRL, to the infinite time horizon setting. We consider two formulations (maximum discounted causal entropy and maximum average causal entropy) appropriate for the infinite horizon case and show that both result in optimization programs that can be reformulated as convex optimization problems; thus, admitting efficient computation. We then develop a gradient-based algorithm for the maximum discounted causal entropy formulation that enjoys the desired feature of being model agnostic, a property that is absent in many previous IRL algorithms. We propose the stationary soft Bellman policy, a key building block in the gradient-based algorithm, and study its properties in depth, which not only lead to theoretical insight into its analytical properties, but also help motivate a large toolkit of methods for implementing the gradient-based algorithm. Finally, we select three algorithms of this type and apply them to two problem instances involving demonstration data from a simple controlled queuing network model inspired by problems in air traffic management.

...read moreread less

Journal Article•DOI•

A Bellman Approach for Regional Optimal Control Problems in $\mathbb{R}^N$

[...]

Guy Barles, Ariela Briani, Emmanuel Chasseigne

15 May 2014-Siam Journal on Control and Optimization

TL;DR: The main results are to identify the right Hamilton--Jacobi--Bellman equation and to provide the maximal and minimal solutions, as well as conditions for uniqueness.

...read moreread less

Abstract: This article is a continuation of a previous work where we studied infinite horizon control problems for which the dynamic, running cost, and control space may be different in two half-spaces of some Euclidian space $\mathbb{R}^N$. In this article we extend our results in several directions: (i) to more general domains; (ii) to consideration of finite horizon control problems; (iii) to weakening the controllability assumptions. We use a Bellman approach and our main results are to identify the right Hamilton--Jacobi--Bellman equation (and, in particular, the right conditions to be put on the interfaces separating the regions where the dynamic and running cost are different) and to provide the maximal and minimal solutions, as well as conditions for uniqueness. We also provide stability results for such equations.

...read moreread less

Journal Article•DOI•

Dynamic programming for mean-field type control

[...]

Mathieu Laurière¹, Olivier Pironneau¹•Institutions (1)

Pierre-and-Marie-Curie University¹

01 Sep 2014-Comptes Rendus Mathematique

TL;DR: This work derives HJB equations and applies them to two examples, a portfolio optimization and a systemic risk model, and shows that Bellman's principle applies to the dynamic programming value function V(\tau,\rho_\tau) where the dependency on $\rho$ is functional as in P.L. Lions' analysis of mean-filed games (2007).

...read moreread less

Optimally solving Dec-POMDPs as Continuous-State MDPs: Theory and Algorithms

[...]

Jilles Steeve Dibangoye, Christopher Amato, Olivier Buffet, François Charpillet

01 Apr 2014

TL;DR: In this article, the authors introduce the idea of transforming a Dec-POMDP into a continuous-state deterministic MDP with a piecewise-linear and convex value function.

...read moreread less

Abstract: Decentralized partially observable Markov decision processes (Dec-POMDPs) provide a general model for decision-making under uncertainty in cooperative decentralized settings, but are difficult to solve optimally (NEXP-Complete). As a new way of solving these problems, we introduce the idea of transforming a Dec-POMDP into a continuous-state deterministic MDP with a piecewise-linear and convex value function. This approach makes use of the fact that planning can be accomplished in a centralized offline manner, while execution can still be distributed. This new Dec-POMDP formulation, which we call an occupancy MDP, allows powerful POMDP and continuous-state MDP methods to be used for the first time. When the curse of dimensionality becomes too prohibitive, we refine this basic approach and present ways to combine heuristic search and compact representations that exploit the structure present in multi-agent domains, without losing the ability to eventually converge to an optimal solution. In particular, we introduce feature-based heuristic search that relies on feature-based compact representations, point-based updates and efficient action selection. A theoretical analysis demonstrates that our feature-based heuristic search algorithms terminate in finite time with an optimal solution. We include an extensive empirical analysis using well known benchmarks, thereby demonstrating our approach provides significant scalability improvements compared to the state of the art.

...read moreread less

Posted Content•

An Approximate Dynamic Programming Algorithm for Monotone Value Functions

[...]

Daniel R. Jiang¹, Warren B. Powell¹•Institutions (1)

Princeton University¹

08 Jan 2014-arXiv: Optimization and Control

TL;DR: A general finite-horizon problem setting where the optimal value function is monotone is described, a convergence proof for Monotone-ADP is presented, and numerical results are shown for three application domains: optimal stopping, energy storage/allocation, and glycemic control for diabetes patients.

...read moreread less

Abstract: Many sequential decision problems can be formulated as Markov Decision Processes (MDPs) where the optimal value function (or cost-to-go function) can be shown to satisfy a monotone structure in some or all of its dimensions. When the state space becomes large, traditional techniques, such as the backward dynamic programming algorithm (i.e., backward induction or value iteration), may no longer be effective in finding a solution within a reasonable time frame, and thus we are forced to consider other approaches, such as approximate dynamic programming (ADP). We propose a provably convergent ADP algorithm called Monotone-ADP that exploits the monotonicity of the value functions in order to increase the rate of convergence. In this paper, we describe a general finite-horizon problem setting where the optimal value function is monotone, present a convergence proof for Monotone-ADP under various technical assumptions, and show numerical results for three application domains: optimal stopping, energy storage/allocation, and glycemic control for diabetes patients. The empirical results indicate that by taking advantage of monotonicity, we can attain high quality solutions within a relatively small number of iterations, using up to two orders of magnitude less computation than is needed to compute the optimal solution exactly.

...read moreread less

Journal Article•DOI•

Risk-sensitive control of continuous time Markov chains

[...]

Mrinal K. Ghosh¹, Subhamay Saha¹•Institutions (1)

Indian Institute of Science¹

10 Jul 2014-Stochastics An International Journal of Probability and Stochastic Processes

TL;DR: In this paper, the authors considered the problem of risk-sensitive control of continuous time Markov chains taking values in discrete state space and developed a policy iteration algorithm for finding an optimal control.

...read moreread less

Abstract: We study risk-sensitive control of continuous time Markov chains taking values in discrete state space. We study both finite and infinite horizon problems. In the finite horizon problem we characterize the value function via Hamilton Jacobi Bellman equation and obtain an optimal Markov control. We do the same for infinite horizon discounted cost case. In the infinite horizon average cost case we establish the existence of an optimal stationary control under certain Lyapunov condition. We also develop a policy iteration algorithm for finding an optimal control.

...read moreread less

Journal Article•DOI•

Optimal Control Problems of Fully Coupled FBSDEs and Viscosity Solutions of Hamilton--Jacobi--Bellman Equations

[...]

Juan Li¹, Qingmeng Wei•Institutions (1)

Shandong University¹

13 May 2014-Siam Journal on Control and Optimization

TL;DR: In this paper, the authors studied the stochastic optimal control problem of fully coupled forward-backward stochastically differential equations (FBSDEs) and proved that the value functions are deterministic, satisfy the dynamic programming principle, and are viscosity solutions.

...read moreread less

Abstract: In this paper we study stochastic optimal control problems of fully coupled forward-backward stochastic differential equations (FBSDEs). The recursive cost functionals are defined by controlled fully coupled FBSDEs. We use a new method to prove that the value functions are deterministic, satisfy the dynamic programming principle, and are viscosity solutions to the associated generalized Hamilton--Jacobi--Bellman (HJB) equations. For this we generalize the notion of stochastic backward semigroup introduced by Peng Topics on Stochastic Analysis, Science Press, Beijing, 1997, pp. 85--138. We emphasize that when $\sigma$ depends on the second component of the solution $(Y, Z)$ of the BSDE it makes the stochastic control much more complicated and has as a consequence that the associated HJB equation is combined with an algebraic equation. We prove that the algebraic equation has a unique solution, and moreover, we also give the representation for this solution. On the other hand, we prove a new local existence...

...read moreread less

Journal Article•DOI•

Gauge optimization and duality

[...]

Michael P. Friedlander¹, Ives Macêdo², Ting Kei Pong³•Institutions (3)

University of California, Davis¹, University of British Columbia², Hong Kong Polytechnic University³

03 Dec 2014-Siam Journal on Optimization

TL;DR: This paper proposes a particular form of the problem that exposes some useful properties of the gauge optimization framework (such as the variational properties of its value function), and yet maintains most of the generality of the abstract form of gauge optimization.

...read moreread less

Abstract: Gauge functions significantly generalize the notion of a norm, and gauge optimization, as defined by [R. M. Freund, Math. Programming, 38 (1987), pp. 47--67], seeks the element of a convex set that is minimal with respect to a gauge function. This conceptually simple problem can be used to model a remarkable array of useful problems, including a special case of conic optimization, and related problems that arise in machine learning and signal processing. The gauge structure of these problems allows for a special kind of duality framework. This paper explores the duality framework proposed by Freund, and proposes a particular form of the problem that exposes some useful properties of the gauge optimization framework (such as the variational properties of its value function), and yet maintains most of the generality of the abstract form of gauge optimization.

...read moreread less

Proceedings Article•DOI•

Linear Hamilton Jacobi Bellman Equations in high dimensions

[...]

Matanya B. Horowitz¹, Anil Damle², Joel W. Burdick¹•Institutions (2)

California Institute of Technology¹, Stanford University²

01 Dec 2014

TL;DR: In this paper, the authors combine the structure of the Hamilton Jacobi Bellman Equation and its reduction to a linear Partial Differential Equation (PDE), with methods based on low rank tensor representations, known as a separated representations, to address the curse of dimensionality.

...read moreread less

Abstract: The Hamilton Jacobi Bellman Equation (HJB) provides the globally optimal solution to large classes of control problems. Unfortunately, this generality comes at a price, the calculation of such solutions is typically intractible for systems with more than moderate state space size due to the curse of dimensionality. This work combines recent results in the structure of the HJB, and its reduction to a linear Partial Differential Equation (PDE), with methods based on low rank tensor representations, known as a separated representations, to address the curse of dimensionality. The result is an algorithm to solve optimal control problems which scales linearly with the number of states in a system, and is applicable to systems that are nonlinear with stochastic forcing in finite-horizon, average cost, and first-exit settings. The method is demonstrated on inverted pendulum, VTOL aircraft, and quadcopter models, with system dimension two, six, and twelve respectively.

...read moreread less

Journal Article•DOI•

Numerical Solutions to the Bellman Equation of Optimal Control

[...]

Cesar O. Aguilar¹, Arthur J. Krener²•Institutions (2)

California State University, Bakersfield¹, Naval Postgraduate School²

01 Feb 2014-Journal of Optimization Theory and Applications

TL;DR: A numerical algorithm to compute high-order approximate solutions to Bellman’s dynamic programming equation that arises in the optimal stabilization of discrete-time nonlinear control systems using a patchy technique to build local Taylor polynomial approximations defined on small domains.

...read moreread less

Abstract: In this paper, we present a numerical algorithm to compute high-order approximate solutions to Bellman’s dynamic programming equation that arises in the optimal stabilization of discrete-time nonlinear control systems. The method uses a patchy technique to build local Taylor polynomial approximations defined on small domains, which are then patched together to create a piecewise smooth approximation. The numerical domain is dynamically computed as the level sets of the value function are propagated in reverse time under the closed-loop dynamics. The patch domains are constructed such that their radial boundaries are contained in the level sets of the value function and their lateral boundaries are constructed as invariant sets of the closed-loop dynamics. To minimize the computational effort, an adaptive subdivision algorithm is used to determine the number of patches on each level set depending on the relative error in the dynamic programming equation. Numerical tests in 2D and 3D are given to illustrate the accuracy of the method.

...read moreread less

Journal Article•DOI•

A 4-stated DICE: quantitatively addressing uncertainty effects in climate change

[...]

Christian P. Traeger¹•Institutions (1)

University of California¹

01 Aug 2014-Environmental and Resource Economics

TL;DR: In this paper, a state-reduced, recursive dynamic programming implementation of the DICE-2007 model is presented, which simplifies the carbon cycle and the temperature delay equations and solves the infinite planning horizon problem in an arbitrary time step.

...read moreread less

Abstract: We introduce a version of the DICE-2007 model designed for uncertainty analysis. DICE is a wide-spread deterministic integrated assessment model of climate change. Climate change, long-term economic development, and their interactions are highly uncertain. The quantitative analysis of optimal mitigation policy under uncertainty requires a recursive dynamic programming implementation of integrated assessment models. Such implementations are subject to the curse of dimensionality. Every increase in the dimension of the state space is paid for by a combination of (exponentially) increasing processor time, lower quality of the value or policy function approximations, and reductions of the uncertainty domain. The paper promotes a state-reduced, recursive dynamic programming implementation of the DICE-2007 model. We achieve the reduction by simplifying the carbon cycle and the temperature delay equations. We compare our model’s performance and that of the DICE model to the scientific AOGCM models emulated by MAGICC 6.0 and find that our simplified model performs equally well as the original DICE model. Our implementation solves the infinite planning horizon problem in an arbitrary time step. The paper is the first to carefully analyze the quality of the value function approximation using two different types of basis functions and systematically varying the dimension of the basis. We present the closed form, continuous time approximation to the exogenous (discretely and inductively defined) processes in DICE, and we present a numerically more efficient re-normalized Bellman equation that, in addition, can disentangle risk attitude from the propensity to smooth consumption over time.

...read moreread less

Journal Article•DOI•

A Stochastic Approach to Dubins Vehicle Tracking Problems

[...]

Ross P. Anderson, Dejan Milutinovic¹•Institutions (1)

University of California, Santa Cruz¹

28 Mar 2014-IEEE Transactions on Automatic Control

TL;DR: Results indicate how the uncertainty in the target motion, the tracker capabilities, and the time since the last observation can affect the control law, and simulations illustrate that the control can be applied to other continuous, smooth trajectories with no need for additional computation.

...read moreread less

Abstract: An optimal feedback control is developed for fixed-speed, fixed-altitude Unmanned Aerial Vehicle (UAV) to maintain a nominal distance from a ground target in a way that anticipates its unknown future trajectory. Stochasticity is introduced in the problem by assuming that the target motion can be modeled as Brownian motion, which accounts for possible realizations of the unknown target kinematics. Moreover, the possibility for the interruption of observations is included by assuming that the duration of observation times of the target is exponentially distributed, giving rise to two discrete states of operation. A Bellman equation based on an approximating Markov chain that is consistent with the stochastic kinematics is used to compute an optimal control policy that minimizes the expected value of a cost function based on a nominal UAV-target distance. Results indicate how the uncertainty in the target motion, the tracker capabilities, and the time since the last observation can affect the control law, and simulations illustrate that the control can further be applied to other continuous, smooth trajectories with no need for additional computation.

...read moreread less

Journal Article•DOI•

Unified Framework of Mean-Field Formulations for Optimal Multi-Period Mean-Variance Portfolio Selection

[...]

Xiangyu Cui¹, Xun Li, Duan Li²•Institutions (2)

Shanghai University of Finance and Economics¹, The Chinese University of Hong Kong²

14 Mar 2014-IEEE Transactions on Automatic Control

TL;DR: A novel mean-field framework is proposed that offers a more efficient modeling tool and a more accurate solution scheme in tackling directly the issue of nonseparability and deriving the optimal policies analytically for the multi-period mean-variance-type portfolio selection problems.

...read moreread less

Abstract: When a dynamic optimization problem is not decomposable by a stage-wise backward recursion, it is nonseparable in the sense of dynamic programming. The classical dynamic programming-based optimal stochastic control methods would fail in such nonseparable situations as the principle of optimality no longer applies. Among these notorious nonseparable problems, the dynamic mean-variance portfolio selection formulation had posed a great challenge to our research community until recently. Different from the existing literature that invokes embedding schemes and auxiliary parametric formulations to solve the dynamic mean-variance portfolio selection formulation, we propose in this paper a novel mean-field framework that offers a more efficient modeling tool and a more accurate solution scheme in tackling directly the issue of nonseparability and deriving the optimal policies analytically for the multi-period mean-variance-type portfolio selection problems.

...read moreread less

Journal Article•DOI•

Power Allocation for Energy Harvesting Transmitter With Causal Information

[...]

Zhe Wang¹, Vaneet Aggarwal², Xiaodong Wang¹•Institutions (2)

Columbia University¹, AT&T Labs²

12 Sep 2014-IEEE Transactions on Communications

TL;DR: This work considers power allocation for an access-controlled transmitter with energy harvesting capability based on causal observations of the channel fading state and proposes power allocation algorithms for both the finite- and infinite-horizon cases whose computational complexity is significantly lower than that of the standard discrete MDP method but with improved performance.

...read moreread less

Abstract: We consider power allocation for an access-controlled transmitter with energy harvesting capability based on causal observations of the channel fading state. We assume that the system operates in a time-slotted fashion and the channel gain in each slot is a random variable which is independent across slots. Further, we assume that the transmitter is solely powered by a renewable energy source and the energy harvesting process can practically be predicted. With the additional access control for the transmitter and the maximum power constraint, we formulate the stochastic optimization problem of maximizing the achievable rate as a Markov decision process (MDP) with continuous state. To effi- ciently solve the problem, we define an approximate value function based on a piecewise linear fit in terms of the battery state. We show that with the approximate value function, the update in each iteration consists of a group of convex problems with a continuous parameter. Moreover, we derive the optimal solution to these con- vex problems in closed-form. Further, we propose power allocation algorithms for both the finite- and infinite-horizon cases, whose computational complexity is significantly lower than that of the standard discrete MDP method but with improved performance. Extension to the case of a general payoff function and imperfect energy prediction is also considered. Finally, simulation results demonstrate that the proposed algorithms closely approach the optimal performance.

...read moreread less

Journal Article•DOI•

Elementary results on solutions to the bellman equation of dynamic programming: existence, uniqueness, and convergence

[...]

Takashi Kamihigashi¹•Institutions (1)

Kobe University¹

01 Jun 2014-Economic Theory

TL;DR: In this paper, the authors established some elementary results on solutions to the Bellman equation without introducing any topological assumption, and applied these results to two optimal growth models: one with a discontinuous production function and the other with a roughly increasing return.

...read moreread less

Abstract: We establish some elementary results on solutions to the Bellman equation without introducing any topological assumption. Under a small number of conditions, we show that the Bellman equation has a unique solution in a certain set, that this solution is the value function, and that the value function can be computed by value iteration with an appropriate initial condition. In addition, we show that the value function can be computed by the same procedure under alternative conditions. We apply our results to two optimal growth models: one with a discontinuous production function and the other with “roughly increasing” returns.

...read moreread less

Collapse