scispace - formally typeset
Search or ask a question
Topic

Bellman equation

About: Bellman equation is a research topic. Over the lifetime, 5884 publications have been published within this topic receiving 135589 citations.


Papers
More filters
Journal ArticleDOI
TL;DR: In this article, the authors consider a control problem where the state variable is a solution of a stochastic differential equation (SDE) in which the control enters both the drift and the diffusion coefficient.
Abstract: We consider a control problem where the state variable is a solution of a stochastic differential equation (SDE) in which the control enters both the drift and the diffusion coefficient. We study the relaxed problem for which admissible controls are measure-valued processes and the state variable is governed by an SDE driven by an orthogonal martingale measure. Under some mild conditions on the coefficients and pathwise uniqueness, we prove that every diffusion process associated to a relaxed control is a strong limit of a sequence of diffusion processes associated to strict controls. As a consequence, we show that the strict and the relaxed control problems have the same value function and that an optimal relaxed control exists. Moreover we derive a maximum principle of the Pontriagin type, extending the well-known Peng stochastic maximum principle to the class of measure-valued controls.

46 citations

Journal ArticleDOI
TL;DR: The computational complexity and convergence of the hybrid-ADP approach are analyzed, and the method is validated numerically showing that the optimal controller and value function can be learned iteratively online from state observations.
Abstract: This paper presents a hybrid adaptive dynamic programming (hybrid-ADP) approach for determining the optimal continuous and discrete control laws of a switched system online, solely from state observations. The new hybrid-ADP recurrence relationships presented are applicable to model-free control of switched hybrid systems that are possibly nonlinear. The computational complexity and convergence of the hybrid-ADP approach are analyzed, and the method is validated numerically showing that the optimal controller and value function can be learned iteratively online from state observations.

46 citations

Journal ArticleDOI
TL;DR: It is proved that the relationship between the Pontryagin maximum principle and dynamic programming, now expressed in terms of the generalized gradient ofV, is established for a large class of nonsmooth problems.
Abstract: The dynamic programming approach to optimal control theory attempts to characterize the value functionV as a solution to the Hamilton-Jacobian-Bellman equation. Heuristic arguments have long been advanced relating the Pontryagin maximum principle and dynamic programming according to the equation (H(t, x*(t), u*(t), p(t)),−p(t))=√V(t,x*(t)), where (x*, u*) is the optimal control process under consideration,p(t), is the coextremal, andH is the Hamiltonian. The relationship has previously been verified under only very restrictive hypotheses. We prove new results, establishing the relationship, now expressed in terms of the generalized gradient ofV, for a large class of nonsmooth problems.

46 citations

Journal ArticleDOI
TL;DR: The Caratheodory-π trajectories as discussed by the authors are based on the fundamental notion that a closed loop does not necessarily imply closed-form solutions and are verified by an application of Pontryagin's principle as a necessary condition.
Abstract: Recent advances in optimal control theory and computation are used to develop minimum-time solutions for the rest-to-rest reorientation of a generic rigid body. It is shown that the differences in the geometry of the inertia ellipsoid and the control space lead to counterintuitive noneigenaxis maneuvers. The optimality of the open-loop solutions are verified by an application of Pontryagin's principle as a necessary condition and not as a problem-solving tool. It is shown that the nonsmoothness of the lower Hamiltonian compounds the curse of dimensionality associated in solving the Hamilton-Jacobi-Bellman equations for feedback solutions. These difficulties are circumvented by generating Caratheodory-π trajectories, which are based on the fundamental notion that a closed loop does not necessarily imply closed-form solutions. While demonstrating the successful implementation of the proposed method for practical applications, these closed-loop results reveal yet another counterintuitive phenomenon: a suggestion that parameter uncertainties may aid the optimality of the maneuver rather than hinder it.

46 citations

Proceedings Article
01 Jan 2019
TL;DR: This article propose a generalized version of the Bellman equation to learn a single parametric representation for optimal policies over the space of all possible preferences, which can learn policies over multiple competing objectives whose relative importance (preferences) is unknown to the agent.
Abstract: We introduce a new algorithm for multi-objective reinforcement learning (MORL) with linear preferences, with the goal of enabling few-shot adaptation to new tasks. In MORL, the aim is to learn policies over multiple competing objectives whose relative importance (preferences) is unknown to the agent. While this alleviates dependence on scalar reward design, the expected return of a policy can change significantly with varying preferences, making it challenging to learn a single model to produce optimal policies under different preference conditions. We propose a generalized version of the Bellman equation to learn a single parametric representation for optimal policies over the space of all possible preferences. After an initial learning phase, our agent can execute the optimal policy under any given preference, or automatically infer an underlying preference with very few samples. Experiments across four different domains demonstrate the effectiveness of our approach.

46 citations


Network Information
Related Topics (5)
Optimal control
68K papers, 1.2M citations
87% related
Bounded function
77.2K papers, 1.3M citations
85% related
Markov chain
51.9K papers, 1.3M citations
85% related
Linear system
59.5K papers, 1.4M citations
84% related
Optimization problem
96.4K papers, 2.1M citations
83% related
Performance
Metrics
No. of papers in the topic in previous years
YearPapers
2023261
2022537
2021369
2020411
2019348
2018353