Topic

Bellman equation

About: Bellman equation is a research topic. Over the lifetime, 5884 publications have been published within this topic receiving 135589 citations.

...read moreread less

Papers published on a yearly basis

1 / 2

Papers

PDF

Open Access

More filters

Journal Article•DOI•

Asymptotic optimal control of uncertain nonlinear Euler-Lagrange systems

[...]

K. Dupree¹, P.M. Patre¹, Z. D. Wilcox¹, Warren E. Dixon¹•Institutions (1)

University of Florida¹

01 Jan 2011-Automatica

TL;DR: The implicit learning capabilities of the RISE control structure is used to learn the dynamics asymptotically and it is shown that the system converges to a state space system that has a quadratic performance index which has been optimized by an additional control element.

...read moreread less

44 citations

Journal Article•DOI•

Risk-Sensitive Reinforcement Learning Applied to Control under Constraints

[...]

Peter Geibel¹, Fritz Wysotzki²•Institutions (2)

University of Osnabrück¹, Technical University of Berlin²

09 Sep 2011-arXiv: Learning

TL;DR: In this paper, the authors considered the problem of finding good deterministic policies whose risk is smaller than some user-specified threshold, and formalized it as a constrained MDP with two criteria.

...read moreread less

Abstract: In this paper, we consider Markov Decision Processes (MDPs) with error states. Error states are those states entering which is undesirable or dangerous. We define the risk with respect to a policy as the probability of entering such a state when the policy is pursued. We consider the problem of finding good policies whose risk is smaller than some user-specified threshold, and formalize it as a constrained MDP with two criteria. The first criterion corresponds to the value function originally given. We will show that the risk can be formulated as a second criterion function based on a cumulative return, whose definition is independent of the original value function. We present a model free, heuristic reinforcement learning algorithm that aims at finding good deterministic policies. It is based on weighting the original value function and the risk. The weight parameter is adapted in order to find a feasible solution for the constrained problem that has a good performance with respect to the value function. The algorithm was successfully applied to the control of a feed tank with stochastic inflows that lies upstream of a distillation column. This control task was originally formulated as an optimal control problem with chance constraints, and it was solved under certain assumptions on the model to obtain an optimal solution. The power of our learning algorithm is that it can be used even when some of these restrictive assumptions are relaxed.

...read moreread less

44 citations

Journal Article•DOI•

Optimality of Feedback Control Strategies for Qubit Purification

[...]

Howard M. Wiseman¹, Luc Bouten²•Institutions (2)

Griffith University¹, California Institute of Technology²

01 Jun 2008-Quantum Information Processing

TL;DR: In this paper, the authors provide rigorous proofs of optimality in all cases, by applying simple concepts from optimal control theory, including Bellman equations and verification theorems, for rapid purification of qubits, optimized with respect to various goals.

...read moreread less

Abstract: Recently two papers [K. Jacobs, Phys. Rev. A 67, 030301(R) (2003); H. M. Wiseman and J. F. Ralph, New J. Physics 8, 90 (2006)] have derived a number of control strategies for rapid purification of qubits, optimized with respect to various goals. In the former paper the proof of optimality was not mathematically rigorous, while the latter gave only heuristic arguments for optimality. In this paper we provide rigorous proofs of optimality in all cases, by applying simple concepts from optimal control theory, including Bellman equations and verification theorems.

...read moreread less

44 citations

Journal Article•DOI•

UAV-Assisted Wireless Charging for Energy-Constrained IoT Devices Using Dynamic Matching

[...]

Chunxia Su¹, Fang Ye¹, Li-Chun Wang², Li Wang³, Yuan Tian¹, Zhu Han⁴ - Show less +2 more•Institutions (4)

Harbin Engineering University¹, National Chiao Tung University², Beijing University of Posts and Telecommunications³, University of Houston⁴

23 Jan 2020-IEEE Internet of Things Journal

TL;DR: This article, unmanned aerial vehicles (UAVs) are served as carriers of wireless power chargers (WPCs) to charge the ECDs, and a novel multiple-stage dynamic matching algorithm is proposed to solve this problem.

...read moreread less

Abstract: In the emerging Internet-of-Things (IoT) paradigm, the lifetime of energy-constrained devices (ECDs) cannot be ensured due to the limited battery capacity. In this article, unmanned aerial vehicles (UAVs) are served as carriers of wireless power chargers (WPCs) to charge the ECDs. Aiming at maximizing the total amount of charging energy under the constraints of the UAVs and WPCs, a multiple-period charging process problem is formulated. To address this problem, bipartite matching with one-sided preferences is introduced to model the charging relationship between the ECDs and UAVs. Nevertheless, the traditional one-shot static matching is not suitable for this dynamic scenario, and thus the problem is further solved by the novel multiple-stage dynamic matching. Besides, the wireless charging process is history dependent since the current matching result will influence the future initial charging status, and consequently, the Markov decision process (MDP) and Bellman equation are leveraged. Then, by combining the MDP and random serial dictatorship (RSD) matching algorithm together, a four-step algorithm is proposed. In our proposed algorithm, the local MDPs for the ECDs are set up first. Next, using the RSD algorithm, all possible actions can be presented according to the current state. Then, the joint MDP is built based on the local MDPs and all the possible matching results. Finally, the Bellman equation is utilized to select the optimal branch. Finally, simulation results demonstrate the effectiveness of our proposed algorithm.

...read moreread less

44 citations

Journal Article•DOI•

Optimal arbitrage under model uncertainty

[...]

Daniel Fernholz, Ioannis Karatzas

14 Feb 2012-arXiv: Probability

TL;DR: In this paper, the authors characterize the highest return relative to the market that can be achieved using non-anticipative investment rules over a given time horizon, and under any admissible configuration of model parameters that might materialize.

...read moreread less

Abstract: In an equity market model with "Knightian" uncertainty regarding the relative risk and covariance structure of its assets, we characterize in several ways the highest return relative to the market that can be achieved using nonanticipative investment rules over a given time horizon, and under any admissible configuration of model parameters that might materialize. One characterization is in terms of the smallest positive supersolution to a fully nonlinear parabolic partial differential equation of the Hamilton--Jacobi--Bellman type. Under appropriate conditions, this smallest supersolution is the value function of an associated stochastic control problem, namely, the maximal probability with which an auxiliary multidimensional diffusion process, controlled in a manner which affects both its drift and covariance structures, stays in the interior of the positive orthant through the end of the time-horizon. This value function is also characterized in terms of a stochastic game, and can be used to generate an investment rule that realizes such best possible outperformance of the market.

...read moreread less

44 citations

Collapse

Network Information

Performance

Metrics

6,698

Papers

155,793

Citations

No. of papers in the topic in previous years
Year	Papers
2023	261
2022	537
2021	369
2020	411
2019	348
2018	353

Bellman equation

Papers published on a yearly basis

Papers

Trending Questions (10)

Network Information

Related Topics (5)

Performance

Metrics