Showing papers on "Bellman equation published in 2006"

PDF

Open Access

Proceedings Article•

Linearly-solvable Markov decision problems

[...]

04 Dec 2006

TL;DR: A class of MPDs which greatly simplify Reinforcement Learning, which have discrete state spaces and continuous control spaces and enable efficient approximations to traditional MDPs.

...read moreread less

Abstract: We introduce a class of MPDs which greatly simplify Reinforcement Learning. They have discrete state spaces and continuous control spaces. The controls have the effect of rescaling the transition probabilities of an underlying Markov chain. A control cost penalizing KL divergence between controlled and uncontrolled transition probabilities makes the minimization problem convex, and allows analytical computation of the optimal controls given the optimal value function. An exponential transformation of the optimal value function makes the minimized Bellman equation linear. Apart from their theoretical significance, the new MDPs enable efficient approximations to traditional MDPs. Shortest path problems are approximated to arbitrary precision with largest eigenvalue problems, yielding an O (n) algorithm. Accurate approximations to generic MDPs are obtained via continuous embedding reminiscent of LP relaxation in integer programming. Off-policy learning of the optimal value function is possible without need for state-action values; the new algorithm (Z-learning) outperforms Q-learning.

...read moreread less

430 citations

Journal Article•DOI•

Point-Based Value Iteration for Continuous POMDPs

[...]

Josep M. Porta, Nikos Vlassis¹, Matthijs T. J. Spaan, Pascal Poupart•Institutions (1)

University of Luxembourg¹

01 Dec 2006-Journal of Machine Learning Research

TL;DR: It is demonstrated that the value function for continuous POMDPs is convex in the beliefs over continuous state spaces, and piecewise-linear convex for the particular case of discrete observations and actions but still continuous states.

...read moreread less

Abstract: We propose a novel approach to optimize Partially Observable Markov Decisions Processes (POMDPs) defined on continuous spaces. To date, most algorithms for model-based POMDPs are restricted to discrete states, actions, and observations, but many real-world problems such as, for instance, robot navigation, are naturally defined on continuous spaces. In this work, we demonstrate that the value function for continuous POMDPs is convex in the beliefs over continuous state spaces, and piecewise-linear convex for the particular case of discrete observations and actions but still continuous states. We also demonstrate that continuous Bellman backups are contracting and isotonic ensuring the monotonic convergence of value-iteration algorithms. Relying on those properties, we extend the algorithm, originally developed for discrete POMDPs, to work in continuous state spaces by representing the observation, transition, and reward models using Gaussian mixtures, and the beliefs using Gaussian mixtures or particle sets. With these representations, the integrals that appear in the Bellman backup can be computed in closed form and, therefore, the algorithm is computationally feasible. Finally, we further extend to deal with continuous action and observation sets by designing effective sampling approaches.

...read moreread less

277 citations

Journal Article•DOI•

Adaptive stepsizes for recursive estimation with applications in approximate dynamic programming

[...]

Abraham P. George¹, Warren B. Powell¹•Institutions (1)

Princeton University¹

01 Oct 2006-Machine Learning

TL;DR: This paper reviews the literature on deterministic and stochastic stepsize rules, and derives formulas for optimal stepsizes for minimizing estimation error, and an approximation is proposed for the case where the parameters are unknown.

...read moreread less

Abstract: We address the problem of determining optimal stepsizes for estimating parameters in the context of approximate dynamic programming. The sufficient conditions for convergence of the stepsize rules have been known for 50 years, but practical computational work tends to use formulas with parameters that have to be tuned for specific applications. The problem is that in most applications in dynamic programming, observations for estimating a value function typically come from a data series that can be initially highly transient. The degree of transience affects the choice of stepsize parameters that produce the fastest convergence. In addition, the degree of initial transience can vary widely among the value function parameters for the same dynamic program. This paper reviews the literature on deterministic and stochastic stepsize rules, and derives formulas for optimal stepsizes for minimizing estimation error. This formula assumes certain parameters are known, and an approximation is proposed for the case where the parameters are unknown. Experimental work shows that the approximation provides faster convergence than other popular formulas.

...read moreread less

201 citations

Journal Article•DOI•

Dynamic-Programming Approximations for Stochastic Time-Staged Integer Multicommodity-Flow Problems

[...]

Huseyin Topaloglu¹, Warren B. Powell²•Institutions (2)

Cornell University¹, Princeton University²

01 Jan 2006-Informs Journal on Computing

TL;DR: This paper proposes an iterative, adaptive dynamic-programming-based methodology that makes use of linear or nonlinear approximations of the value function and shows that the proposed method provides high-quality solutions and is computationally attractive for large problems.

...read moreread less

Abstract: In this paper, we consider a stochastic and time-dependent version of the min-cost integer multicommodity-flow problem that arises in the dynamic resource allocation context. In this problem class, tasks arriving over time have to be covered by a set of indivisible and reusable resources of different types. The assignment of a resource to a task removes the task from the system, modifies the resource, and generates a profit. When serving a task, resources of different types can serve as substitutes of each other, possibly yielding different revenues. We propose an iterative, adaptive dynamic-programming-based methodology that makes use of linear or nonlinear approximations of the value function. Our numerical work shows that the proposed method provides high-quality solutions and is computationally attractive for large problems.

...read moreread less

201 citations

Journal Article•DOI•

Decomposition with branch-and-cut approaches for two-stage stochastic mixed-integer programming

[...]

Suvrajeet Sen¹, Hanif D. Sherali²•Institutions (2)

University of Arizona¹, Virginia Tech²

01 Apr 2006-Mathematical Programming

TL;DR: This paper discusses alternative decomposition methods in which the second-stage integer subproblems are solved using branch-and-cut methods, and lays the foundation for two-stage stochastic mixed-integer programs.

...read moreread less

Abstract: Decomposition has proved to be one of the more effective tools for the solution of large-scale problems, especially those arising in stochastic programming. A decomposition method with wide applicability is Benders' decomposition, which has been applied to both stochastic programming as well as integer programming problems. However, this method of decomposition relies on convexity of the value function of linear programming subproblems. This paper is devoted to a class of problems in which the second-stage subproblem(s) may impose integer restrictions on some variables. The value function of such integer subproblem(s) is not convex, and new approaches must be designed. In this paper, we discuss alternative decomposition methods in which the second-stage integer subproblems are solved using branch-and-cut methods. One of the main advantages of our decomposition scheme is that Stochastic Mixed-Integer Programming (SMIP) problems can be solved by dividing a large problem into smaller MIP subproblems that can be solved in parallel. This paper lays the foundation for such decomposition methods for two-stage stochastic mixed-integer programs.

...read moreread less

182 citations

Proceedings Article•DOI•

Automatic basis function construction for approximate dynamic programming and reinforcement learning

[...]

Philipp W. Keller¹, Shie Mannor¹, Doina Precup¹•Institutions (1)

McGill University¹

25 Jun 2006

TL;DR: This work proposes to use neighborhood component analysis (Goldberger et al., 2005), a dimensionality reduction technique created for supervised learning, in order to map a high-dimensional state space to a low-dimensional space, based on the Bellman error, or on the temporal difference (TD) error.

...read moreread less

Abstract: We address the problem of automatically constructing basis functions for linear approximation of the value function of a Markov Decision Process (MDP). Our work builds on results by Bertsekas and Castanon (1989) who proposed a method for automatically aggregating states to speed up value iteration. We propose to use neighborhood component analysis (Goldberger et al., 2005), a dimensionality reduction technique created for supervised learning, in order to map a high-dimensional state space to a low-dimensional space, based on the Bellman error, or on the temporal difference (TD) error. We then place basis function in the lower-dimensional space. These are added as new features for the linear function approximator. This approach is applied to a high-dimensional inventory control problem.

...read moreread less

177 citations

Journal Article•DOI•

A deterministic‐control‐based approach motion by curvature

[...]

Robert V. Kohn¹, Sylvia Serfaty¹•Institutions (1)

New York University¹

01 Mar 2006-Communications on Pure and Applied Mathematics

TL;DR: In this article, it was shown that the level-set formulation of motion by mean curvature is a degenerate parabolic equation, and that its solution can be interpreted as the value function of a deterministic two-person game.

...read moreread less

Abstract: The level-set formulation of motion by mean curvature is a degenerate parabolic equation. We show that its solution can be interpreted as the value function of a deterministic two-person game. More precisely, we give a family of discrete-time, two-person games whose value functions converge in the continuous-time limit to the solution of the motion-by-curvature PDE. For a convex domain, the boundary's “first arrival time” solves a degenerate elliptic equation; this corresponds, in our game-theoretic setting, to a minimum-exit-time problem. For a nonconvex domain the two-person game still makes sense; we draw a connection between its minimum exit time and the evolution of curves with velocity equal to the “positive part of the curvature.” These results are unexpected, because the value function of a deterministic control problem is normally the solution of a first-order Hamilton-Jacobi equation. Our situation is different because the usual first-order calculation is singular. © 2005 Wiley Periodicals, Inc.

...read moreread less

159 citations

Journal Article•DOI•

Stochastic homogenization of Hamilton‐Jacobi‐Bellman equations

[...]

Elena Kosygina¹, Fraydoun Rezakhanlou², Srinivasa R. S. Varadhan³•Institutions (3)

Baruch College¹, University of California, Berkeley², Courant Institute of Mathematical Sciences³

01 Oct 2006-Communications on Pure and Applied Mathematics

TL;DR: In this article, the authors studied the homogenization of some Hamilton-Jacobi-Bellman equations with a vanishing second-order term in a stationary ergodic random medium under the hyperbolic scaling of time and space.

...read moreread less

Abstract: We study the homogenization of some Hamilton-Jacobi-Bellman equations with a vanishing second-order term in a stationary ergodic random medium under the hyperbolic scaling of time and space. Imposing certain convexity, growth, and regularity assumptions on the Hamiltonian, we show the locally uniform convergence of solutions of such equations to the solution of a deterministic “effective” first-order Hamilton-Jacobi equation. The effective Hamiltonian is obtained from the original stochastic Hamiltonian by a minimax formula. Our homogenization results have a large-deviations interpretation for a diffusion in a random environment. c � 2006 Wiley Periodicals, Inc.

...read moreread less

158 citations

Journal Article•DOI•

Optimal control for linear systems with multiple time delays in control input

[...]

Michael Basin¹, Jesus Rodriguez-Gonzalez¹•Institutions (1)

Universidad Autónoma de Nuevo León¹

16 Jan 2006-IEEE Transactions on Automatic Control

TL;DR: This note presents the optimal linear-quadratic (LQ) regulator for a linear system with multiple time delays in the control input and establishes a duality between the solutions of the optimal filtering problem for linear systems with multipleTime delay in the observations and the optimal LQ control problem forlinear systems with three time delays.

...read moreread less

Abstract: This note presents the optimal linear-quadratic (LQ) regulator for a linear system with multiple time delays in the control input. Optimality of the solution is proved in two steps. First, a necessary optimality condition is derived from the maximum principle. Then, the sufficiency of this condition is established by verifying that it satisfies the Hamilton-Jacobi-Bellman equation. Using an illustrative example, the performance of the obtained optimal regulator is compared against the performance of the optimal LQ regulator for linear systems without delays and some other feasible feedback regulators that are linear in the state variables. Finally, the note establishes a duality between the solutions of the optimal filtering problem for linear systems with multiple time delays in the observations and the optimal LQ control problem for linear systems with multiple time delays in the control input.

...read moreread less

144 citations

Journal Article•DOI•

An Algorithm for Approximate Multiparametric Convex Programming

[...]

Alberto Bemporad¹, Carlo Filippi²•Institutions (2)

University of Siena¹, University of Padua²

01 Sep 2006-Computational Optimization and Applications

TL;DR: For multiparametric convex nonlinear programming problems, a recursive algorithm for approximating, within a given suboptimality tolerance, the value function and an optimizer as functions of the parameters is proposed.

...read moreread less

Abstract: For multiparametric convex nonlinear programming problems we propose a recursive algorithm for approximating, within a given suboptimality tolerance, the value function and an optimizer as functions of the parameters. The approximate solution is expressed as a piecewise affine function over a simplicial partition of a subset of the feasible parameters, and it is organized over a tree structure for efficiency of evaluation. Adaptations of the algorithm to deal with multiparametric semidefinite programming and multiparametric geometric programming are provided and exemplified. The approach is relevant for real-time implementation of several optimization-based feedback control strategies.

...read moreread less

131 citations

Journal Article•DOI•

Portfolio selection under incomplete information

[...]

Simon Brendle¹•Institutions (1)

Stanford University¹

01 May 2006-Stochastic Processes and their Applications

TL;DR: In this article, an optimal investment problem under incomplete information and power utility is studied, and the optimal portfolio policy is identified and compared to the value function in the fully observable case, and quantify the loss of utility due to incomplete information.

...read moreread less

Journal Article•DOI•

Uniqueness Results for Second-Order Bellman--Isaacs Equations under Quadratic Growth Assumptions and Applications

[...]

Francesca Da Lio, Olivier Ley

01 Jan 2006-Siam Journal on Control and Optimization

TL;DR: The value function of a finite horizon stochastic control problem with unbounded controls is characterized as the unique viscosity solution of the corresponding dynamic programming equation.

...read moreread less

Abstract: In this paper, we prove a comparison result between semicontinuous viscosity sub- and supersolutions growing at most quadratically of second-order degenerate parabolic Hamilton--Jacobi--Bellman and Isaacs equations. As an application, we characterize the value function of a finite horizon stochastic control problem with unbounded controls as the unique viscosity solution of the corresponding dynamic programming equation.

...read moreread less

Journal Article•DOI•

Finite-Horizon Optimal Investment with Transaction Costs: A Parabolic Double Obstacle Problem

[...]

Min Dai¹, Fahuai Yi²•Institutions (2)

National University of Singapore¹, South China Normal University²

01 Jan 2006-Social Science Research Network

TL;DR: In this paper, the optimal investment problem of a CRRA investor who faces proportional transaction costs and finite time horizon was studied using a partial differential equation approach, and it was shown that the problem is equivalent to a parabolic double obstacle problem involving two free boundaries that correspond to the optimal buying and selling policies.

...read moreread less

Abstract: This paper concerns optimal investment problem of a CRRA investor who faces proportional transaction costs and finite time horizon. Using a partial differential equation approach, we reveal that the problem is equivalent to a parabolic double obstacle problem involving two free boundaries that correspond to the optimal buying and selling policies. This enables us to make use of the well developed theory of variational inequality to study the problem. The $C^{2,1}$ regularity of the value function is proven and the optimal investment policies are completely characterized. Relying on the double obstacle problem, we extend the binomial method widely used in option pricing to determine the optimal investment policies. Numerical examples are presented as well.

...read moreread less

Journal Article•DOI•

On solving discrete two-stage stochastic programs having mixed-integer first- and second-stage variables

[...]

Hanif D. Sherali¹, Xiaomei Zhu¹•Institutions (1)

Virginia Tech¹

01 Sep 2006-Mathematical Programming

TL;DR: A decomposition-based branch-and-bound (DBAB) algorithm for solving two-stage stochastic programs having mixed-integer first- and second-stage variables that converges to a global optimal solution.

...read moreread less

Abstract: In this paper, we propose a decomposition-based branch-and-bound (DBAB) algorithm for solving two-stage stochastic programs having mixed-integer first- and second-stage variables. A modified Benders' decomposition method is developed, where the Benders' subproblems define lower bounding second-stage value functions of the first-stage variables that are derived by constructing a certain partial convex hull representation of the two-stage solution space. This partial convex hull is sequentially generated using a convexification scheme such as the Reformulation-Linearization Technique (RLT) or lift-and-project process, which yields valid inequalities that are reusable in the subsequent subproblems by updating the values of the first-stage variables. A branch-and-bound algorithm is designed based on a hyperrectangular partitioning process, using the established property that any resulting lower bounding Benders' master problem defined over a hyperrectangle yields the same objective value as the original stochastic program over that region if the first-stage variable solution is an extreme point of the defining hyperrectangle or the second-stage solution satisfies the binary restrictions. We prove that this algorithm converges to a global optimal solution. Some numerical examples and computational results are presented to demonstrate the efficacy of this approach.

...read moreread less

Journal Article•DOI•

Perturbation approach to sensitivity analysis in mathematical programming

[...]

Enrique Castillo¹, Antonio J. Conejo², Carmen Castillo³, Roberto Mínguez², D. Ortigosa⁴ - Show less +1 more•Institutions (4)

University of Cantabria¹, University of Castilla–La Mancha², University of Granada³, University of La Rioja⁴

01 Jan 2006-Journal of Optimization Theory and Applications

TL;DR: In this article, a perturbation approach for performing sensitivity analysis of mathematical programming problems is presented, where the active constraints are not assumed to remain active if the problem data are perturbed, nor the partial derivatives are assumed to exist.

...read moreread less

Abstract: This paper presents a perturbation approach for performing sensitivity analysis of mathematical programming problems. Contrary to standard methods, the active constraints are not assumed to remain active if the problem data are perturbed, nor the partial derivatives are assumed to exist. In other words, all the elements, variables, parameters, Karush–Kuhn–Tucker multipliers, and objective function values may vary provided that optimality is maintained and the general structure of a feasible perturbation (which is a polyhedral cone) is obtained. This allows determining: (a) the local sensitivities, (b) whether or not partial derivatives exist, and (c) if the directional derivative for a given direction exists. A method for the simultaneous obtention of the sensitivities of the objective function optimal value and the primal and dual variable values with respect to data is given. Three examples illustrate the concepts presented and the proposed methodology. Finally, some relevant conclusions are drawn.

...read moreread less

Journal Article•DOI•

Scheduling control for queueing systems with many servers: asymptotic optimality in heavy traffic

[...]

Rami Atar

23 Feb 2006-arXiv: Probability

TL;DR: In this paper, a multiclass queueing system is considered, with heterogeneous service stations, each consisting of many servers with identical capabilities, and an optimal control problem is formulated, where the control corresponds to scheduling and routing, and the cost is a cumulative discounted functional of the system state.

...read moreread less

Abstract: A multiclass queueing system is considered, with heterogeneous service stations, each consisting of many servers with identical capabilities. An optimal control problem is formulated, where the control corresponds to scheduling and routing, and the cost is a cumulative discounted functional of the system's state. We examine two versions of the problem: ``nonpreemptive,'' where service is uninterruptible, and ``preemptive,'' where service to a customer can be interrupted and then resumed, possibly at a different station. We study the problem in the asymptotic heavy traffic regime proposed by Halfin and Whitt, in which the arrival rates and the number of servers at each station grow without bound. The two versions of the problem are not, in general, asymptotically equivalent in this regime, with the preemptive version showing an asymptotic behavior that is, in a sense, much simpler. Under appropriate assumptions on the structure of the system we show: (i) The value function for the preemptive problem converges to $V$, the value of a related diffusion control problem. (ii) The two versions of the problem are asymptotically equivalent, and in particular nonpreemptive policies can be constructed that asymptotically achieve the value $V$. The construction of these policies is based on a Hamilton--Jacobi--Bellman equation associated with $V$.

...read moreread less

Journal Article•DOI•

Adaptive Poisson disorder problem

[...]

Erhan Bayraktar, Savas Dayanik¹, Ioannis Karatzas²•Institutions (2)

Princeton University¹, Columbia University²

01 Aug 2006-Annals of Applied Probability

TL;DR: The objective is to design an alarm time which is adapted to the history of the arrival process and detects the disorder time as soon as possible, and assumes in this paper that the new arrival rate after the disorder is a random variable.

...read moreread less

Abstract: We study the quickest detection problem of a sudden change in the arrival rate of a Poisson process from a known value to an unknown and unobservable value at an unknown and unobservable disorder time. Our objective is to design an alarm time which is adapted to the history of the arrival process and detects the disorder time as soon as possible. In previous solvable versions of the Poisson disorder problem, the arrival rate after the disorder has been assumed a known constant. In reality, however, we may at most have some prior information about the likely values of the new arrival rate before the disorder actually happens, and insufficient estimates of the new rate after the disorder happens. Consequently, we assume in this paper that the new arrival rate after the disorder is a random variable. The detection problem is shown to admit a finite-dimensional Markovian sufficient statistic, if the new rate has a discrete distribution with finitely many atoms. Furthermore, the detection problem is cast as a discounted optimal stopping problem with running cost for a finite-dimensional piecewise-deterministic Markov process. This optimal stopping problem is studied in detail in the special case where the new arrival rate has Bernoulli distribution. This is a nontrivial optimal stopping problem for a two-dimensional piecewise-deterministic Markov process driven by the same point process. Using a suitable single-jump operator, we solve it fully, describe the analytic properties of the value function and the stopping region, and present methods for their numerical calculation. We provide a concrete example where the value function does not satisfy the smooth-fit principle on a proper subset of the connected, continuously differentiable optimal stopping boundary, whereas it does on the complement of this set.

...read moreread less

Journal Article•DOI•

Numerical methods for differential games based on partial differential equations

[...]

Maurizio Falcone¹•Institutions (1)

Sapienza University of Rome¹

01 Jun 2006-International Game Theory Review

TL;DR: This paper first solves the Isaacs equation associated to the game to get an approximate value function and then uses it to reconstruct approximate optimal feedback controls and optimal trajectories and solves some classical pursuit-evasion games.

...read moreread less

Abstract: In this paper we present some numerical methods for the solution of two-persons zero-sum deterministic differential games. The methods are based on the dynamic programming approach. We first solve the Isaacs equation associated to the game to get an approximate value function and then we use it to reconstruct approximate optimal feedback controls and optimal trajectories. The approximation schemes also have an interesting control interpretation since the time-discrete scheme stems from a dynamic programming principle for the associated discrete time dynamical system. The general framework for convergence results to the value function is the theory of viscosity solutions. Numerical experiments are presented solving some classical pursuit-evasion games.

...read moreread less

Journal Article•DOI•

Optimal adaptive control policy for joint machine maintenance and product quality control

[...]

Yarlin Kuo¹•Institutions (1)

National Yunlin University of Science and Technology¹

01 Jun 2006-European Journal of Operational Research

TL;DR: This paper studies the joint machine maintenance and product quality control problem of a finite horizon discrete time Markovian deteriorating, state unobservable batch production system and formulate the system as a partially observable Markov decision process and derive some properties of the optimal value function.

...read moreread less

Posted Content•

Singular trajectories of control-affine systems

[...]

Yacine Chitour, Frédéric Jean, Emmanuel Trélat

19 Jul 2006-arXiv: Optimization and Control

TL;DR: In this article, the authors provide characterizations for singular trajectories of control-affine systems and show that, under generic assumptions, such trajectories share nice properties, related to computational aspects; more precisely, they show that for a generic system, all nontrivial trajectories are of minimal order and of corank one.

...read moreread less

Abstract: When applying methods of optimal control to motion planning or stabilization problems, some theoretical or numerical difficulties may arise, due to the presence of specific trajectories, namely, singular minimizing trajectories of the underlying optimal control problem. In this article, we provide characterizations for singular trajectories of control-affine systems. We prove that, under generic assumptions, such trajectories share nice properties, related to computational aspects; more precisely, we show that, for a generic system -- with respect to the Whitney topology --, all nontrivial singular trajectories are of minimal order and of corank one. These results, established both for driftless and for control-affine systems, extend previous results. As a consequence, for generic systems having more than two vector fields, and for a fixed cost, there do not exist minimizing singular trajectories. We also prove that, given a control system satisfying the LARC, singular trajectories are strictly abnormal, generically with respect to the cost. We then show how these results can be used to derive regularity results for the value function and in the theory of Hamilton-Jacobi equations, which in turn have applications for stabilization and motion planning, both from the theoretical and implementation issues.

...read moreread less

Journal Article•DOI•

Choice of approximator and design of penalty function for an approximate dynamic programming based control approach

[...]

Jong Min Lee¹, Niket S. Kaisare¹, Jay H. Lee¹•Institutions (1)

Georgia Institute of Technology¹

01 Feb 2006-Journal of Process Control

TL;DR: In this article, a penalty term is added to the objective function in each minimization to discourage the optimizer from finding a solution in the regions of state space where the local data density is inadequately low.

...read moreread less

Journal Article•DOI•

On the structure of solutions of ergodic type bellman equation related to risk-sensitive control

[...]

Hidehiro Kaise¹, Shuenn-Jyi Sheu¹•Institutions (1)

Academia Sinica¹

01 Jan 2006-Annals of Probability

TL;DR: In this article, the authors considered the problem of ergodicity of the Bellman equations of the type related to risk-sensitive control and proved that the problem in general has multiple solutions and classified the solutions by a global behavior of the diffusion process associated with the given solution.

...read moreread less

Abstract: Bellman equations of ergodic type related to risk-sensitive control are considered. We treat the case that the nonlinear term is positive quadratic form on first-order partial derivatives of solution, which includes linear exponential quadratic Gaussian control problem. In this paper we prove that the equation in general has multiple solutions. We shall specify the set of all the classical solutions and classify the solutions by a global behavior of the diffusion process associated with the given solution. The solution associated with ergodic diffusion process plays particular role. We shall also prove the uniqueness of such solution. Furthermore, the solution which gives us ergodicity is stable under perturbation of coefficients. Finally, we have a representation result for the solution corresponding to the ergodic diffusion.

...read moreread less

Journal Article•DOI•

Solving factored MDPs with hybrid state and action variables

[...]

Branislav Kveton¹, Milos Hauskrecht¹, Carlos Guestrin²•Institutions (2)

University of Pittsburgh¹, Carnegie Mellon University²

01 Sep 2006-Journal of Artificial Intelligence Research

TL;DR: In this paper, a hybrid factored Markov decision process (MDP) model was proposed for a compact representation of large decision problems with continuous and discrete variables, and a new hybrid approximate linear programming (HALP) framework that permits their efficient solutions.

...read moreread less

Abstract: Efficient representations and solutions for large decision problems with continuous and discrete variables are among the most important challenges faced by the designers of automated decision support systems. In this paper, we describe a novel hybrid factored Markov decision process (MDP) model that allows for a compact representation of these problems, and a new hybrid approximate linear programming (HALP) framework that permits their efficient solutions. The central idea of HALP is to approximate the optimal value function by a linear combination of basis functions and optimize its weights by linear programming. We analyze both theoretical and computational aspects of this approach, and demonstrate its scale-up potential on several hybrid optimization problems.

...read moreread less

Journal Article•DOI•

“Itô's Lemma” and the Bellman Equation for Poisson Processes: An Applied View

[...]

Ken Sennewald¹, Klaus Wälde²•Institutions (2)

Ifo Institute for Economic Research¹, University of Würzburg²

30 Jun 2006-Journal of Economics

TL;DR: Using the Hamilton-Jacobi-Bellman equation, this article derived both a Keynes-Ramsey rule and a closed form solution for an optimal consumption-investment problem with labor income.

...read moreread less

Abstract: Using the Hamilton-Jacobi-Bellman equation, we derive both a Keynes-Ramsey rule and a closed form solution for an optimal consumption-investment problem with labor income. The utility function is unbounded and uncertainty stems from a Poisson process. Our results can be derived because of the proofs presented in the accompanying paper by Sennewald (2006). Additional examples are given which highlight the correct use of the Hamilton-Jacobi-Bellman equation and the change-of-variables formula (sometimes referred to as ``Ito's Lemma'') under Poisson uncertainty.

...read moreread less

Journal Article•DOI•

An linear programming based lower bound for the simple assembly line balancing problem

[...]

Marie-Rose Peeters¹, Zeger Degraeve²•Institutions (2)

Electrabel¹, London Business School²

01 Feb 2006-European Journal of Operational Research

TL;DR: A new lower bound is presented, namely the LP relaxation of an integer programming formulation based on Dantzig-Wolfe decomposition, and a column generation algorithm is proposed to solve the formulation.

...read moreread less

Journal Article•DOI•

Optimization Techniques for State-Constrained Control and Obstacle Problems

[...]

Alexander B. Kurzhanski, Ian M. Mitchell¹, Pravin Varaiya²•Institutions (2)

University of British Columbia¹, University of California, Berkeley²

13 Jun 2006-Journal of Optimization Theory and Applications

TL;DR: In this article, a dynamic programming approach is used to design control laws for systems subject to complex state constraints, where the problem of reachability under state constraints is formulated in terms of nonstandard minmax and maxmin cost functionals, and the corresponding value functions are given by Hamilton-Jacobi-Bellman (HJB) equations or variational inequalities.

...read moreread less

Abstract: The design of control laws for systems subject to complex state constraints still presents a significant challenge. This paper explores a dynamic programming approach to a specific class of such problems, that of reachability under state constraints. The problems are formulated in terms of nonstandard minmax and maxmin cost functionals, and the corresponding value functions are given in terms of Hamilton-Jacobi-Bellman (HJB) equations or variational inequalities. The solution of these relations is complicated in general; however, for linear systems, the value functions may be described also in terms of duality relations of convex analysis and minmax theory. Consequently, solution techniques specific to systems with a linear structure may be designed independently of HJB theory. These techniques are illustrated through two examples.

...read moreread less

Optimal impulse control for a multidimensional cash management system with generalized cost functions

[...]

Stefano Baccarin¹•Institutions (1)

University of Turin¹

01 Jan 2006

TL;DR: In this paper, the authors considered the optimal control of a multidimensional cash management system where the cash balances fluctuate as a homogeneous diffusion process in R n, and formulated the model as an impulse control problem on an unbounded domain with unbounded cost functions.

...read moreread less

Abstract: We consider the optimal control of a multidimensional cash management system where the cash balances fluctuate as a homogeneous diffusion process in R n . We formulate the model as an impulse control problem on an unbounded domain with unbounded cost functions. Under general assumptions we characterize the value function as a weak solution of a quasi-variational inequality in a weighted Sobolev space and we show the existence of an optimal policy. Moreover we prove the local uniform convergence of a finite element scheme to compute numerically the value function and the optimal cost. We compute the solution of the model in two-dimensions with linear and distance cost functions, showing what are the shapes of the optimal policies in these two simple cases. Finally our third numerical experiment computes the solution in the realistic case of the cash concentration of two bank accounts made by a centralized treasury.

...read moreread less

Journal Article•DOI•

On the structure of solutions of ergodic type Bellman equation related to risk-sensitive control

[...]

Hidehiro Kaise¹, Shuenn-Jyi Sheu¹•Institutions (1)

Academia Sinica¹

27 Feb 2006-arXiv: Probability

...read moreread less

Journal Article•DOI•

Diffusion approximations for controlled stochastic networks: An asymptotic bound for the value function

[...]

Amarjit Budhiraja, Arka P. Ghosh¹•Institutions (1)

Iowa State University¹

01 Nov 2006-Annals of Applied Probability

TL;DR: In this article, the authors considered the scheduling control problem for a family of unitary networks under heavy traffic, with general interarrival and service times, probabilistic routing and infinite horizon discounted linear holding cost.

...read moreread less

Abstract: We consider the scheduling control problem for a family of unitary networks under heavy traffic, with general interarrival and service times, probabilistic routing and infinite horizon discounted linear holding cost. A natural nonanticipativity condition for admissibility of control policies is introduced. The condition is seen to hold for a broad class of problems. Using this formulation of admissible controls and a time-transformation technique, we establish that the infimum of the cost for the network control problem over all admissible sequencing control policies is asymptotically bounded below by the value function of an associated diffusion control problem (the Brownian control problem). This result provides a useful bound on the best achievable performance for any admissible control policy for a wide class of networks.

...read moreread less

Journal Article•DOI•

Martingale Approach to Stochastic Control with Discretionary Stopping

[...]

Ioannis Karatzas¹, Ingrid-Mona Zamfirescu²•Institutions (2)

Columbia University¹, Baruch College²

20 Feb 2006-Applied Mathematics and Optimization

TL;DR: In this article, a martingale approach for continuous-time stochastic control with discretionary stopping is presented, and necessary and sufficient conditions for the optimality of a control strategy are provided.

...read moreread less

Abstract: We develop a martingale approach for continuous-time stochastic control with discretionary stopping. The relevant Dynamic Programming Equation and Maximum Principle are presented. Necessary and sufficient conditions are provided for the optimality of a control strategy; these are analogues of the "equalization" and "thriftiness" conditions introduced by Dubins and Savage (1976) in a related, discrete-time context. The existence of a thrifty control strategy is established.

...read moreread less