scispace - formally typeset
Search or ask a question

Showing papers on "Bellman equation published in 2006"


Proceedings Article
04 Dec 2006
TL;DR: A class of MPDs which greatly simplify Reinforcement Learning, which have discrete state spaces and continuous control spaces and enable efficient approximations to traditional MDPs.
Abstract: We introduce a class of MPDs which greatly simplify Reinforcement Learning. They have discrete state spaces and continuous control spaces. The controls have the effect of rescaling the transition probabilities of an underlying Markov chain. A control cost penalizing KL divergence between controlled and uncontrolled transition probabilities makes the minimization problem convex, and allows analytical computation of the optimal controls given the optimal value function. An exponential transformation of the optimal value function makes the minimized Bellman equation linear. Apart from their theoretical significance, the new MDPs enable efficient approximations to traditional MDPs. Shortest path problems are approximated to arbitrary precision with largest eigenvalue problems, yielding an O (n) algorithm. Accurate approximations to generic MDPs are obtained via continuous embedding reminiscent of LP relaxation in integer programming. Off-policy learning of the optimal value function is possible without need for state-action values; the new algorithm (Z-learning) outperforms Q-learning.

430 citations


Journal ArticleDOI
TL;DR: It is demonstrated that the value function for continuous POMDPs is convex in the beliefs over continuous state spaces, and piecewise-linear convex for the particular case of discrete observations and actions but still continuous states.
Abstract: We propose a novel approach to optimize Partially Observable Markov Decisions Processes (POMDPs) defined on continuous spaces. To date, most algorithms for model-based POMDPs are restricted to discrete states, actions, and observations, but many real-world problems such as, for instance, robot navigation, are naturally defined on continuous spaces. In this work, we demonstrate that the value function for continuous POMDPs is convex in the beliefs over continuous state spaces, and piecewise-linear convex for the particular case of discrete observations and actions but still continuous states. We also demonstrate that continuous Bellman backups are contracting and isotonic ensuring the monotonic convergence of value-iteration algorithms. Relying on those properties, we extend the algorithm, originally developed for discrete POMDPs, to work in continuous state spaces by representing the observation, transition, and reward models using Gaussian mixtures, and the beliefs using Gaussian mixtures or particle sets. With these representations, the integrals that appear in the Bellman backup can be computed in closed form and, therefore, the algorithm is computationally feasible. Finally, we further extend to deal with continuous action and observation sets by designing effective sampling approaches.

277 citations


Journal ArticleDOI
TL;DR: This paper reviews the literature on deterministic and stochastic stepsize rules, and derives formulas for optimal stepsizes for minimizing estimation error, and an approximation is proposed for the case where the parameters are unknown.
Abstract: We address the problem of determining optimal stepsizes for estimating parameters in the context of approximate dynamic programming. The sufficient conditions for convergence of the stepsize rules have been known for 50 years, but practical computational work tends to use formulas with parameters that have to be tuned for specific applications. The problem is that in most applications in dynamic programming, observations for estimating a value function typically come from a data series that can be initially highly transient. The degree of transience affects the choice of stepsize parameters that produce the fastest convergence. In addition, the degree of initial transience can vary widely among the value function parameters for the same dynamic program. This paper reviews the literature on deterministic and stochastic stepsize rules, and derives formulas for optimal stepsizes for minimizing estimation error. This formula assumes certain parameters are known, and an approximation is proposed for the case where the parameters are unknown. Experimental work shows that the approximation provides faster convergence than other popular formulas.

201 citations


Journal ArticleDOI
TL;DR: This paper proposes an iterative, adaptive dynamic-programming-based methodology that makes use of linear or nonlinear approximations of the value function and shows that the proposed method provides high-quality solutions and is computationally attractive for large problems.
Abstract: In this paper, we consider a stochastic and time-dependent version of the min-cost integer multicommodity-flow problem that arises in the dynamic resource allocation context. In this problem class, tasks arriving over time have to be covered by a set of indivisible and reusable resources of different types. The assignment of a resource to a task removes the task from the system, modifies the resource, and generates a profit. When serving a task, resources of different types can serve as substitutes of each other, possibly yielding different revenues. We propose an iterative, adaptive dynamic-programming-based methodology that makes use of linear or nonlinear approximations of the value function. Our numerical work shows that the proposed method provides high-quality solutions and is computationally attractive for large problems.

201 citations


Journal ArticleDOI
TL;DR: This paper discusses alternative decomposition methods in which the second-stage integer subproblems are solved using branch-and-cut methods, and lays the foundation for two-stage stochastic mixed-integer programs.
Abstract: Decomposition has proved to be one of the more effective tools for the solution of large-scale problems, especially those arising in stochastic programming. A decomposition method with wide applicability is Benders' decomposition, which has been applied to both stochastic programming as well as integer programming problems. However, this method of decomposition relies on convexity of the value function of linear programming subproblems. This paper is devoted to a class of problems in which the second-stage subproblem(s) may impose integer restrictions on some variables. The value function of such integer subproblem(s) is not convex, and new approaches must be designed. In this paper, we discuss alternative decomposition methods in which the second-stage integer subproblems are solved using branch-and-cut methods. One of the main advantages of our decomposition scheme is that Stochastic Mixed-Integer Programming (SMIP) problems can be solved by dividing a large problem into smaller MIP subproblems that can be solved in parallel. This paper lays the foundation for such decomposition methods for two-stage stochastic mixed-integer programs.

182 citations


Proceedings ArticleDOI
25 Jun 2006
TL;DR: This work proposes to use neighborhood component analysis (Goldberger et al., 2005), a dimensionality reduction technique created for supervised learning, in order to map a high-dimensional state space to a low-dimensional space, based on the Bellman error, or on the temporal difference (TD) error.
Abstract: We address the problem of automatically constructing basis functions for linear approximation of the value function of a Markov Decision Process (MDP). Our work builds on results by Bertsekas and Castanon (1989) who proposed a method for automatically aggregating states to speed up value iteration. We propose to use neighborhood component analysis (Goldberger et al., 2005), a dimensionality reduction technique created for supervised learning, in order to map a high-dimensional state space to a low-dimensional space, based on the Bellman error, or on the temporal difference (TD) error. We then place basis function in the lower-dimensional space. These are added as new features for the linear function approximator. This approach is applied to a high-dimensional inventory control problem.

177 citations


Journal ArticleDOI
TL;DR: In this article, it was shown that the level-set formulation of motion by mean curvature is a degenerate parabolic equation, and that its solution can be interpreted as the value function of a deterministic two-person game.
Abstract: The level-set formulation of motion by mean curvature is a degenerate parabolic equation. We show that its solution can be interpreted as the value function of a deterministic two-person game. More precisely, we give a family of discrete-time, two-person games whose value functions converge in the continuous-time limit to the solution of the motion-by-curvature PDE. For a convex domain, the boundary's “first arrival time” solves a degenerate elliptic equation; this corresponds, in our game-theoretic setting, to a minimum-exit-time problem. For a nonconvex domain the two-person game still makes sense; we draw a connection between its minimum exit time and the evolution of curves with velocity equal to the “positive part of the curvature.” These results are unexpected, because the value function of a deterministic control problem is normally the solution of a first-order Hamilton-Jacobi equation. Our situation is different because the usual first-order calculation is singular. © 2005 Wiley Periodicals, Inc.

159 citations


Journal ArticleDOI
TL;DR: In this article, the authors studied the homogenization of some Hamilton-Jacobi-Bellman equations with a vanishing second-order term in a stationary ergodic random medium under the hyperbolic scaling of time and space.
Abstract: We study the homogenization of some Hamilton-Jacobi-Bellman equations with a vanishing second-order term in a stationary ergodic random medium under the hyperbolic scaling of time and space. Imposing certain convexity, growth, and regularity assumptions on the Hamiltonian, we show the locally uniform convergence of solutions of such equations to the solution of a deterministic “effective” first-order Hamilton-Jacobi equation. The effective Hamiltonian is obtained from the original stochastic Hamiltonian by a minimax formula. Our homogenization results have a large-deviations interpretation for a diffusion in a random environment. c � 2006 Wiley Periodicals, Inc.

158 citations


Journal ArticleDOI
TL;DR: This note presents the optimal linear-quadratic (LQ) regulator for a linear system with multiple time delays in the control input and establishes a duality between the solutions of the optimal filtering problem for linear systems with multipleTime delay in the observations and the optimal LQ control problem forlinear systems with three time delays.
Abstract: This note presents the optimal linear-quadratic (LQ) regulator for a linear system with multiple time delays in the control input. Optimality of the solution is proved in two steps. First, a necessary optimality condition is derived from the maximum principle. Then, the sufficiency of this condition is established by verifying that it satisfies the Hamilton-Jacobi-Bellman equation. Using an illustrative example, the performance of the obtained optimal regulator is compared against the performance of the optimal LQ regulator for linear systems without delays and some other feasible feedback regulators that are linear in the state variables. Finally, the note establishes a duality between the solutions of the optimal filtering problem for linear systems with multiple time delays in the observations and the optimal LQ control problem for linear systems with multiple time delays in the control input.

144 citations


Journal ArticleDOI
TL;DR: For multiparametric convex nonlinear programming problems, a recursive algorithm for approximating, within a given suboptimality tolerance, the value function and an optimizer as functions of the parameters is proposed.
Abstract: For multiparametric convex nonlinear programming problems we propose a recursive algorithm for approximating, within a given suboptimality tolerance, the value function and an optimizer as functions of the parameters. The approximate solution is expressed as a piecewise affine function over a simplicial partition of a subset of the feasible parameters, and it is organized over a tree structure for efficiency of evaluation. Adaptations of the algorithm to deal with multiparametric semidefinite programming and multiparametric geometric programming are provided and exemplified. The approach is relevant for real-time implementation of several optimization-based feedback control strategies.

131 citations


Journal ArticleDOI
TL;DR: In this article, an optimal investment problem under incomplete information and power utility is studied, and the optimal portfolio policy is identified and compared to the value function in the fully observable case, and quantify the loss of utility due to incomplete information.

Journal ArticleDOI
TL;DR: The value function of a finite horizon stochastic control problem with unbounded controls is characterized as the unique viscosity solution of the corresponding dynamic programming equation.
Abstract: In this paper, we prove a comparison result between semicontinuous viscosity sub- and supersolutions growing at most quadratically of second-order degenerate parabolic Hamilton--Jacobi--Bellman and Isaacs equations. As an application, we characterize the value function of a finite horizon stochastic control problem with unbounded controls as the unique viscosity solution of the corresponding dynamic programming equation.

Journal ArticleDOI
TL;DR: In this paper, the optimal investment problem of a CRRA investor who faces proportional transaction costs and finite time horizon was studied using a partial differential equation approach, and it was shown that the problem is equivalent to a parabolic double obstacle problem involving two free boundaries that correspond to the optimal buying and selling policies.
Abstract: This paper concerns optimal investment problem of a CRRA investor who faces proportional transaction costs and finite time horizon. Using a partial differential equation approach, we reveal that the problem is equivalent to a parabolic double obstacle problem involving two free boundaries that correspond to the optimal buying and selling policies. This enables us to make use of the well developed theory of variational inequality to study the problem. The $C^{2,1}$ regularity of the value function is proven and the optimal investment policies are completely characterized. Relying on the double obstacle problem, we extend the binomial method widely used in option pricing to determine the optimal investment policies. Numerical examples are presented as well.

Journal ArticleDOI
TL;DR: A decomposition-based branch-and-bound (DBAB) algorithm for solving two-stage stochastic programs having mixed-integer first- and second-stage variables that converges to a global optimal solution.
Abstract: In this paper, we propose a decomposition-based branch-and-bound (DBAB) algorithm for solving two-stage stochastic programs having mixed-integer first- and second-stage variables. A modified Benders' decomposition method is developed, where the Benders' subproblems define lower bounding second-stage value functions of the first-stage variables that are derived by constructing a certain partial convex hull representation of the two-stage solution space. This partial convex hull is sequentially generated using a convexification scheme such as the Reformulation-Linearization Technique (RLT) or lift-and-project process, which yields valid inequalities that are reusable in the subsequent subproblems by updating the values of the first-stage variables. A branch-and-bound algorithm is designed based on a hyperrectangular partitioning process, using the established property that any resulting lower bounding Benders' master problem defined over a hyperrectangle yields the same objective value as the original stochastic program over that region if the first-stage variable solution is an extreme point of the defining hyperrectangle or the second-stage solution satisfies the binary restrictions. We prove that this algorithm converges to a global optimal solution. Some numerical examples and computational results are presented to demonstrate the efficacy of this approach.

Journal ArticleDOI
TL;DR: In this article, a perturbation approach for performing sensitivity analysis of mathematical programming problems is presented, where the active constraints are not assumed to remain active if the problem data are perturbed, nor the partial derivatives are assumed to exist.
Abstract: This paper presents a perturbation approach for performing sensitivity analysis of mathematical programming problems. Contrary to standard methods, the active constraints are not assumed to remain active if the problem data are perturbed, nor the partial derivatives are assumed to exist. In other words, all the elements, variables, parameters, Karush–Kuhn–Tucker multipliers, and objective function values may vary provided that optimality is maintained and the general structure of a feasible perturbation (which is a polyhedral cone) is obtained. This allows determining: (a) the local sensitivities, (b) whether or not partial derivatives exist, and (c) if the directional derivative for a given direction exists. A method for the simultaneous obtention of the sensitivities of the objective function optimal value and the primal and dual variable values with respect to data is given. Three examples illustrate the concepts presented and the proposed methodology. Finally, some relevant conclusions are drawn.

Journal ArticleDOI
TL;DR: In this paper, a multiclass queueing system is considered, with heterogeneous service stations, each consisting of many servers with identical capabilities, and an optimal control problem is formulated, where the control corresponds to scheduling and routing, and the cost is a cumulative discounted functional of the system state.
Abstract: A multiclass queueing system is considered, with heterogeneous service stations, each consisting of many servers with identical capabilities. An optimal control problem is formulated, where the control corresponds to scheduling and routing, and the cost is a cumulative discounted functional of the system's state. We examine two versions of the problem: ``nonpreemptive,'' where service is uninterruptible, and ``preemptive,'' where service to a customer can be interrupted and then resumed, possibly at a different station. We study the problem in the asymptotic heavy traffic regime proposed by Halfin and Whitt, in which the arrival rates and the number of servers at each station grow without bound. The two versions of the problem are not, in general, asymptotically equivalent in this regime, with the preemptive version showing an asymptotic behavior that is, in a sense, much simpler. Under appropriate assumptions on the structure of the system we show: (i) The value function for the preemptive problem converges to $V$, the value of a related diffusion control problem. (ii) The two versions of the problem are asymptotically equivalent, and in particular nonpreemptive policies can be constructed that asymptotically achieve the value $V$. The construction of these policies is based on a Hamilton--Jacobi--Bellman equation associated with $V$.

Journal ArticleDOI
TL;DR: The objective is to design an alarm time which is adapted to the history of the arrival process and detects the disorder time as soon as possible, and assumes in this paper that the new arrival rate after the disorder is a random variable.
Abstract: We study the quickest detection problem of a sudden change in the arrival rate of a Poisson process from a known value to an unknown and unobservable value at an unknown and unobservable disorder time. Our objective is to design an alarm time which is adapted to the history of the arrival process and detects the disorder time as soon as possible. In previous solvable versions of the Poisson disorder problem, the arrival rate after the disorder has been assumed a known constant. In reality, however, we may at most have some prior information about the likely values of the new arrival rate before the disorder actually happens, and insufficient estimates of the new rate after the disorder happens. Consequently, we assume in this paper that the new arrival rate after the disorder is a random variable. The detection problem is shown to admit a finite-dimensional Markovian sufficient statistic, if the new rate has a discrete distribution with finitely many atoms. Furthermore, the detection problem is cast as a discounted optimal stopping problem with running cost for a finite-dimensional piecewise-deterministic Markov process. This optimal stopping problem is studied in detail in the special case where the new arrival rate has Bernoulli distribution. This is a nontrivial optimal stopping problem for a two-dimensional piecewise-deterministic Markov process driven by the same point process. Using a suitable single-jump operator, we solve it fully, describe the analytic properties of the value function and the stopping region, and present methods for their numerical calculation. We provide a concrete example where the value function does not satisfy the smooth-fit principle on a proper subset of the connected, continuously differentiable optimal stopping boundary, whereas it does on the complement of this set.

Journal ArticleDOI
TL;DR: This paper first solves the Isaacs equation associated to the game to get an approximate value function and then uses it to reconstruct approximate optimal feedback controls and optimal trajectories and solves some classical pursuit-evasion games.
Abstract: In this paper we present some numerical methods for the solution of two-persons zero-sum deterministic differential games. The methods are based on the dynamic programming approach. We first solve the Isaacs equation associated to the game to get an approximate value function and then we use it to reconstruct approximate optimal feedback controls and optimal trajectories. The approximation schemes also have an interesting control interpretation since the time-discrete scheme stems from a dynamic programming principle for the associated discrete time dynamical system. The general framework for convergence results to the value function is the theory of viscosity solutions. Numerical experiments are presented solving some classical pursuit-evasion games.

Journal ArticleDOI
TL;DR: This paper studies the joint machine maintenance and product quality control problem of a finite horizon discrete time Markovian deteriorating, state unobservable batch production system and formulate the system as a partially observable Markov decision process and derive some properties of the optimal value function.

Posted Content
TL;DR: In this article, the authors provide characterizations for singular trajectories of control-affine systems and show that, under generic assumptions, such trajectories share nice properties, related to computational aspects; more precisely, they show that for a generic system, all nontrivial trajectories are of minimal order and of corank one.
Abstract: When applying methods of optimal control to motion planning or stabilization problems, some theoretical or numerical difficulties may arise, due to the presence of specific trajectories, namely, singular minimizing trajectories of the underlying optimal control problem. In this article, we provide characterizations for singular trajectories of control-affine systems. We prove that, under generic assumptions, such trajectories share nice properties, related to computational aspects; more precisely, we show that, for a generic system -- with respect to the Whitney topology --, all nontrivial singular trajectories are of minimal order and of corank one. These results, established both for driftless and for control-affine systems, extend previous results. As a consequence, for generic systems having more than two vector fields, and for a fixed cost, there do not exist minimizing singular trajectories. We also prove that, given a control system satisfying the LARC, singular trajectories are strictly abnormal, generically with respect to the cost. We then show how these results can be used to derive regularity results for the value function and in the theory of Hamilton-Jacobi equations, which in turn have applications for stabilization and motion planning, both from the theoretical and implementation issues.

Journal ArticleDOI
TL;DR: In this article, a penalty term is added to the objective function in each minimization to discourage the optimizer from finding a solution in the regions of state space where the local data density is inadequately low.

Journal ArticleDOI
TL;DR: In this article, the authors considered the problem of ergodicity of the Bellman equations of the type related to risk-sensitive control and proved that the problem in general has multiple solutions and classified the solutions by a global behavior of the diffusion process associated with the given solution.
Abstract: Bellman equations of ergodic type related to risk-sensitive control are considered. We treat the case that the nonlinear term is positive quadratic form on first-order partial derivatives of solution, which includes linear exponential quadratic Gaussian control problem. In this paper we prove that the equation in general has multiple solutions. We shall specify the set of all the classical solutions and classify the solutions by a global behavior of the diffusion process associated with the given solution. The solution associated with ergodic diffusion process plays particular role. We shall also prove the uniqueness of such solution. Furthermore, the solution which gives us ergodicity is stable under perturbation of coefficients. Finally, we have a representation result for the solution corresponding to the ergodic diffusion.

Journal ArticleDOI
TL;DR: In this paper, a hybrid factored Markov decision process (MDP) model was proposed for a compact representation of large decision problems with continuous and discrete variables, and a new hybrid approximate linear programming (HALP) framework that permits their efficient solutions.
Abstract: Efficient representations and solutions for large decision problems with continuous and discrete variables are among the most important challenges faced by the designers of automated decision support systems. In this paper, we describe a novel hybrid factored Markov decision process (MDP) model that allows for a compact representation of these problems, and a new hybrid approximate linear programming (HALP) framework that permits their efficient solutions. The central idea of HALP is to approximate the optimal value function by a linear combination of basis functions and optimize its weights by linear programming. We analyze both theoretical and computational aspects of this approach, and demonstrate its scale-up potential on several hybrid optimization problems.

Journal ArticleDOI
TL;DR: Using the Hamilton-Jacobi-Bellman equation, this article derived both a Keynes-Ramsey rule and a closed form solution for an optimal consumption-investment problem with labor income.
Abstract: Using the Hamilton-Jacobi-Bellman equation, we derive both a Keynes-Ramsey rule and a closed form solution for an optimal consumption-investment problem with labor income. The utility function is unbounded and uncertainty stems from a Poisson process. Our results can be derived because of the proofs presented in the accompanying paper by Sennewald (2006). Additional examples are given which highlight the correct use of the Hamilton-Jacobi-Bellman equation and the change-of-variables formula (sometimes referred to as ``Ito's Lemma'') under Poisson uncertainty.

Journal ArticleDOI
TL;DR: A new lower bound is presented, namely the LP relaxation of an integer programming formulation based on Dantzig-Wolfe decomposition, and a column generation algorithm is proposed to solve the formulation.

Journal ArticleDOI
TL;DR: In this article, a dynamic programming approach is used to design control laws for systems subject to complex state constraints, where the problem of reachability under state constraints is formulated in terms of nonstandard minmax and maxmin cost functionals, and the corresponding value functions are given by Hamilton-Jacobi-Bellman (HJB) equations or variational inequalities.
Abstract: The design of control laws for systems subject to complex state constraints still presents a significant challenge. This paper explores a dynamic programming approach to a specific class of such problems, that of reachability under state constraints. The problems are formulated in terms of nonstandard minmax and maxmin cost functionals, and the corresponding value functions are given in terms of Hamilton-Jacobi-Bellman (HJB) equations or variational inequalities. The solution of these relations is complicated in general; however, for linear systems, the value functions may be described also in terms of duality relations of convex analysis and minmax theory. Consequently, solution techniques specific to systems with a linear structure may be designed independently of HJB theory. These techniques are illustrated through two examples.

01 Jan 2006
TL;DR: In this paper, the authors considered the optimal control of a multidimensional cash management system where the cash balances fluctuate as a homogeneous diffusion process in R n, and formulated the model as an impulse control problem on an unbounded domain with unbounded cost functions.
Abstract: We consider the optimal control of a multidimensional cash management system where the cash balances fluctuate as a homogeneous diffusion process in R n . We formulate the model as an impulse control problem on an unbounded domain with unbounded cost functions. Under general assumptions we characterize the value function as a weak solution of a quasi-variational inequality in a weighted Sobolev space and we show the existence of an optimal policy. Moreover we prove the local uniform convergence of a finite element scheme to compute numerically the value function and the optimal cost. We compute the solution of the model in two-dimensions with linear and distance cost functions, showing what are the shapes of the optimal policies in these two simple cases. Finally our third numerical experiment computes the solution in the realistic case of the cash concentration of two bank accounts made by a centralized treasury.

Journal ArticleDOI
TL;DR: In this article, the authors considered the problem of ergodicity of the Bellman equations of the type related to risk-sensitive control and proved that the problem in general has multiple solutions and classified the solutions by a global behavior of the diffusion process associated with the given solution.
Abstract: Bellman equations of ergodic type related to risk-sensitive control are considered. We treat the case that the nonlinear term is positive quadratic form on first-order partial derivatives of solution, which includes linear exponential quadratic Gaussian control problem. In this paper we prove that the equation in general has multiple solutions. We shall specify the set of all the classical solutions and classify the solutions by a global behavior of the diffusion process associated with the given solution. The solution associated with ergodic diffusion process plays particular role. We shall also prove the uniqueness of such solution. Furthermore, the solution which gives us ergodicity is stable under perturbation of coefficients. Finally, we have a representation result for the solution corresponding to the ergodic diffusion.

Journal ArticleDOI
TL;DR: In this article, the authors considered the scheduling control problem for a family of unitary networks under heavy traffic, with general interarrival and service times, probabilistic routing and infinite horizon discounted linear holding cost.
Abstract: We consider the scheduling control problem for a family of unitary networks under heavy traffic, with general interarrival and service times, probabilistic routing and infinite horizon discounted linear holding cost. A natural nonanticipativity condition for admissibility of control policies is introduced. The condition is seen to hold for a broad class of problems. Using this formulation of admissible controls and a time-transformation technique, we establish that the infimum of the cost for the network control problem over all admissible sequencing control policies is asymptotically bounded below by the value function of an associated diffusion control problem (the Brownian control problem). This result provides a useful bound on the best achievable performance for any admissible control policy for a wide class of networks.

Journal ArticleDOI
TL;DR: In this article, a martingale approach for continuous-time stochastic control with discretionary stopping is presented, and necessary and sufficient conditions for the optimality of a control strategy are provided.
Abstract: We develop a martingale approach for continuous-time stochastic control with discretionary stopping. The relevant Dynamic Programming Equation and Maximum Principle are presented. Necessary and sufficient conditions are provided for the optimality of a control strategy; these are analogues of the "equalization" and "thriftiness" conditions introduced by Dubins and Savage (1976) in a related, discrete-time context. The existence of a thrifty control strategy is established.