Topic

Bellman equation

About: Bellman equation is a research topic. Over the lifetime, 5884 publications have been published within this topic receiving 135589 citations.

...read moreread less

Papers published on a yearly basis

1 / 2

Papers

PDF

Open Access

More filters

Posted Content•

A Multiperiod Newsvendor Problem with Partially Observed Demand

[...]

Alain Bensoussan¹, Metin Çakanyildirim¹, Suresh Sethi¹•Institutions (1)

University of Texas at Dallas¹

04 Feb 2008-Social Science Research Network

TL;DR: In a newsvendor problem with partially observed Markovian demand, the optimal order is set to exceed the myopic optimal order, and a near-optimal solution is characterized by establishing that the value function is piecewise linear.

...read moreread less

Abstract: We consider a newsvendor problem with partially observed Markovian demand. Demand is observed if it is less than the inventory. Otherwise, only the event that it is larger than or equal to the inventory is observed. These observations are used to update the demand distribution from one period to the next. The state of the resulting dynamic programming equation is the current demand distribution, which is generally infinite dimensional. We use unnormalized probabilities to convert the nonlinear state transition equation to a linear one. This helps in proving the existence of an optimal feedback ordering policy. So as to learn more about the demand, the optimal order is set to exceed the myopic optimal order. The optimal cost decreases as the demand distribution decreases in the hazard rate order. In a special case with finitely many demand values, we characterize a near-optimal solution by establishing that the value function is piecewise linear.

...read moreread less

80 citations

Posted Content•

A Generalized Algorithm for Multi-Objective Reinforcement Learning and Policy Adaptation.

[...]

Runzhe Yang¹, Xingyuan Sun¹, Karthik Narasimhan¹•Institutions (1)

Princeton University¹

21 Aug 2019-arXiv: Learning

TL;DR: A generalized version of the Bellman equation is proposed to learn a single parametric representation for optimal policies over the space of all possible preferences in MORL, with the goal of enabling few-shot adaptation to new tasks.

...read moreread less

Abstract: We introduce a new algorithm for multi-objective reinforcement learning (MORL) with linear preferences, with the goal of enabling few-shot adaptation to new tasks. In MORL, the aim is to learn policies over multiple competing objectives whose relative importance (preferences) is unknown to the agent. While this alleviates dependence on scalar reward design, the expected return of a policy can change significantly with varying preferences, making it challenging to learn a single model to produce optimal policies under different preference conditions. We propose a generalized version of the Bellman equation to learn a single parametric representation for optimal policies over the space of all possible preferences. After an initial learning phase, our agent can execute the optimal policy under any given preference, or automatically infer an underlying preference with very few samples. Experiments across four different domains demonstrate the effectiveness of our approach.

...read moreread less

80 citations

Journal Article•DOI•

Low-Thrust, High-Accuracy Trajectory Optimization

[...]

I. Michael Ross¹, Qi Gong¹, Pooya Sekhavat¹•Institutions (1)

Naval Postgraduate School¹

01 Jul 2007-Journal of Guidance Control and Dynamics

TL;DR: An antialiasing trajectory optimization method is developed based on Bellman’s principle of optimality and is extremely simple to implement, and optimal feedback controls are obtained without recourse to the complexities of the Hamilton–Jacobi theory.

...read moreread less

Abstract: J OURNAL OF G UIDANCE , C ONTROL , AND D YNAMICS Vol. 30, No. 4, July–August 2007 Low-Thrust, High-Accuracy Trajectory Optimization I. Michael Ross, ∗ Qi Gong, † and Pooya Sekhavat ‡ Naval Postgraduate School, Monterey, California 93943 DOI: 10.2514/1.23181 Multirevolution, very low-thrust trajectory optimization problems have long been considered difﬁcult problems due to their large time scales and high-frequency responses. By relating this difﬁculty to the well-known problem of aliasing in information theory, an antialiasing trajectory optimization method is developed. The method is based on Bellman’s principle of optimality and is extremely simple to implement. Appropriate technical conditions are derived for generating candidate optimal solutions to a high accuracy. The proposed method is capable of detecting suboptimality by way of three simple tests. These tests are used for verifying the optimality of a candidate solution without the need for computing costates or other covectors that are necessary in the Pontryagin framework. The tests are universal in the sense that they can be used in conjunction with any numerical method whether or not antialiasing is sought. Several low-thrust example problems are solved to illustrate the proposed ideas. It is shown that the antialiased solutions are, in fact, closed-loop solutions; hence, optimal feedback controls are obtained without recourse to the complexities of the Hamilton–Jacobi theory. Because the proposed method is easy to implement, it can be coded on an onboard computer for practical space guidance. the ﬁeld to exchange ideas over several workshops. These workshops, held over 2003–2006, further clariﬁed the scope of the problems, and ongoing efforts to address them are described in [12]. From a practical point of view, the goal is to quickly obtain veriﬁably optimal or near-optimal solutions to ﬁnite- and low-thrust problems so that alternative mission concepts can be analyzed I. Introduction C ONTINUOUS-THRUST trajectory optimization problems have served as one of the motivating problems for optimal control theory since its inception [1–4]. The classic problem posed by Moyer and Pinkham [2] is widely discussed in textbooks [1,3,4] and research articles [5–7]. When the continuity of thrust is removed from such problems, the results can be quite dramatic as illustrated in Fig. 1. This trajectory was obtained using recent advances in optimal control techniques and is extensively discussed in [8]. In canonical units, the problem illustrated in Fig. 1 corresponds to doubling the semimajor axis (a 0 1, a f 2), doubling the eccentricity (e 0 0:1, e f 0:2), and rotating the line of apsides by 1 rad. Note that the extremal thrust steering program for minimizing fuel is not tangential over a signiﬁcant portion of the trajectory. Furthermore, the last burn is a singular control as demonstrated in Fig. 2 by the vanishing of the switching function. Although such ﬁnite-thrust problems can be solved quite readily nowadays, it has long been recognized [9–11] that as the thrust authority is reduced, new problems emerge. These well-known challenges chieﬂy arise as a result of a long ﬂight time measured in terms of the number of orbital revolutions. Consequently, such problems are distinguished from ﬁnite-thrust problems as low-thrust problems although the boundary between ﬁnite thrust and low thrust is not altogether sharp. Although ad hoc techniques may circumvent some of the low- thrust challenges, it is not quite clear if the solutions generated from such methods are veriﬁably optimal. As detailed in [8], the engineering feasibility of a space mission is not dictated by trajectory generation, but by optimality. This is because fuel in space is extraordinarily expensive as the cost of a propellant is driven by the routine of space operations, or the lack of it, and not the chemical composition of the fuel. In an effort to circumvent ad hoc techniques to efﬁciently solve emerging problems in ﬁnite- and low-thrust trajectory optimization, NASA brought together leading experts in al Or iti bit In it Fin rb al O Transfer Trajectory Fig. 1 A benchmark minimum-fuel ﬁnite-thrust orbit transfer problem. Thrust Acceleration, u s = 0 Switching Function, s Received 13 February 2006; accepted for publication 21 August 2006. This material is declared a work of the U.S. Government and is not subject to copyright protection in the United States. Copies of this paper may be made for personal or internal use, on condition that the copier pay the $10.00 per- copy fee to the Copyright Clearance Center, Inc., 222 Rosewood Drive, Danvers, MA 01923; include the code 0731-5090/07 $10.00 in correspondence with the CCC. Professor, Department of Mechanical and Astronautical Engineering; imross@nps.edu. Associate Fellow AIAA. Research Associate, Department of Mechanical and Astronautical Engineering; qgong@nps.edu. Research Scientist, Department of Mechanical and Astronautical Engineering; psekhava@nps.edu. Singular Control s u time (canonical units) Fig. 2 Extremal thrust acceleration (control) program t7 !u and the corresponding switching function t7 !s for the trajectory shown in Fig. 1.

...read moreread less

80 citations

Proceedings Article•

Heuristic search for generalized stochastic shortest path MDPs

[...]

Andrey Kolobov¹, Daniel S. Weld¹, Hector Geffner²•Institutions (2)

University of Washington¹, Pompeu Fabra University²

11 Jun 2011

TL;DR: A new heuristic-search-based family of algorithms, FRET (Find, Revise, Eliminate Traps), is presented and a preliminary empirical evaluation shows that FRET solves GSSPs much more efficiently than Value Iteration.

...read moreread less

Abstract: Research in efficient methods for solving infinite-horizon MDPs has so far concentrated primarily on discounted MDPs and the more general stochastic shortest path problems (SSPs). These are MDPs with 1) an optimal value function V* that is the unique solution of Bellman equation and 2) optimal policies that are the greedy policies w.r.t. V*. This paper's main contribution is the description of a new class of MDPs, that have well-defined optimal solutions that do not comply with either 1 or 2 above. We call our new class Generalized Stochastic Shortest Path (GSSP) problems. GSSP allows more general reward structure than SSP and subsumes several established MDP types including SSP, positive-bounded, negative, and discounted-reward models. While existing efficient heuristic search algorithms like LAO* and LRTDP are not guaranteed to converge to the optimal value function for GSSPs, we present a new heuristic-search-based family of algorithms, FRET (Find, Revise, Eliminate Traps). A preliminary empirical evaluation shows that FRET solves GSSPs much more efficiently than Value Iteration.

...read moreread less

79 citations

Journal Article•DOI•

Bellman equation and viscosity solutions for mean-field stochastic control problem

[...]

Huyên Pham, Xiaoli Wei

01 Jan 2018-ESAIM: Control, Optimisation and Calculus of Variations

TL;DR: In this paper, the authors consider the stochastic optimal control problem of McKean-Vlasov stochastically differential equation where the coefficients may depend upon the joint law of the state and control.

...read moreread less

Abstract: We consider the stochastic optimal control problem of McKean-Vlasov stochastic differential equation where the coefficients may depend upon the joint law of the state and control. By using feedback controls, we reformulate the problem into a deterministic control problem with only the marginal distribution of the process as controlled state variable, and prove that dynamic programming principle holds in its general form. Then, by relying on the notion of differentiability with respect to pro\-bability measures recently introduced by P.L. Lions in [32], and a special Ito formula for flows of probability measures, we derive the (dynamic programming) Bellman equation for mean-field stochastic control problem, and prove a veri\-fication theorem in our McKean-Vlasov framework. We give explicit solutions to the Bellman equation for the linear quadratic mean-field control problem, with applications to the mean-variance portfolio selection and a systemic risk model. We also consider a notion of lifted visc-sity solutions for the Bellman equation, and show the viscosity property and uniqueness of the value function to the McKean-Vlasov control problem. Finally, we consider the case of McKean-Vlasov control problem with open-loop controls and discuss the associated dynamic programming equation that we compare with the case of closed-loop controls.

...read moreread less

79 citations

Collapse

Network Information

Performance

Metrics

6,698

Papers

155,793

Citations

No. of papers in the topic in previous years
Year	Papers
2023	261
2022	537
2021	369
2020	411
2019	348
2018	353

Bellman equation

Papers published on a yearly basis

Papers

Trending Questions (10)

Network Information

Related Topics (5)

Performance

Metrics