scispace - formally typeset
Search or ask a question
Topic

Bellman equation

About: Bellman equation is a research topic. Over the lifetime, 5884 publications have been published within this topic receiving 135589 citations.


Papers
More filters
Journal ArticleDOI
TL;DR: The implicit learning capabilities of the RISE control structure is used to learn the dynamics asymptotically and it is shown that the system converges to a state space system that has a quadratic performance index which has been optimized by an additional control element.

44 citations

Journal ArticleDOI
TL;DR: In this paper, the authors considered the problem of finding good deterministic policies whose risk is smaller than some user-specified threshold, and formalized it as a constrained MDP with two criteria.
Abstract: In this paper, we consider Markov Decision Processes (MDPs) with error states. Error states are those states entering which is undesirable or dangerous. We define the risk with respect to a policy as the probability of entering such a state when the policy is pursued. We consider the problem of finding good policies whose risk is smaller than some user-specified threshold, and formalize it as a constrained MDP with two criteria. The first criterion corresponds to the value function originally given. We will show that the risk can be formulated as a second criterion function based on a cumulative return, whose definition is independent of the original value function. We present a model free, heuristic reinforcement learning algorithm that aims at finding good deterministic policies. It is based on weighting the original value function and the risk. The weight parameter is adapted in order to find a feasible solution for the constrained problem that has a good performance with respect to the value function. The algorithm was successfully applied to the control of a feed tank with stochastic inflows that lies upstream of a distillation column. This control task was originally formulated as an optimal control problem with chance constraints, and it was solved under certain assumptions on the model to obtain an optimal solution. The power of our learning algorithm is that it can be used even when some of these restrictive assumptions are relaxed.

44 citations

Journal ArticleDOI
TL;DR: In this paper, the authors provide rigorous proofs of optimality in all cases, by applying simple concepts from optimal control theory, including Bellman equations and verification theorems, for rapid purification of qubits, optimized with respect to various goals.
Abstract: Recently two papers [K. Jacobs, Phys. Rev. A 67, 030301(R) (2003); H. M. Wiseman and J. F. Ralph, New J. Physics 8, 90 (2006)] have derived a number of control strategies for rapid purification of qubits, optimized with respect to various goals. In the former paper the proof of optimality was not mathematically rigorous, while the latter gave only heuristic arguments for optimality. In this paper we provide rigorous proofs of optimality in all cases, by applying simple concepts from optimal control theory, including Bellman equations and verification theorems.

44 citations

Journal ArticleDOI
TL;DR: This article, unmanned aerial vehicles (UAVs) are served as carriers of wireless power chargers (WPCs) to charge the ECDs, and a novel multiple-stage dynamic matching algorithm is proposed to solve this problem.
Abstract: In the emerging Internet-of-Things (IoT) paradigm, the lifetime of energy-constrained devices (ECDs) cannot be ensured due to the limited battery capacity. In this article, unmanned aerial vehicles (UAVs) are served as carriers of wireless power chargers (WPCs) to charge the ECDs. Aiming at maximizing the total amount of charging energy under the constraints of the UAVs and WPCs, a multiple-period charging process problem is formulated. To address this problem, bipartite matching with one-sided preferences is introduced to model the charging relationship between the ECDs and UAVs. Nevertheless, the traditional one-shot static matching is not suitable for this dynamic scenario, and thus the problem is further solved by the novel multiple-stage dynamic matching. Besides, the wireless charging process is history dependent since the current matching result will influence the future initial charging status, and consequently, the Markov decision process (MDP) and Bellman equation are leveraged. Then, by combining the MDP and random serial dictatorship (RSD) matching algorithm together, a four-step algorithm is proposed. In our proposed algorithm, the local MDPs for the ECDs are set up first. Next, using the RSD algorithm, all possible actions can be presented according to the current state. Then, the joint MDP is built based on the local MDPs and all the possible matching results. Finally, the Bellman equation is utilized to select the optimal branch. Finally, simulation results demonstrate the effectiveness of our proposed algorithm.

44 citations

Journal ArticleDOI
TL;DR: In this paper, the authors characterize the highest return relative to the market that can be achieved using non-anticipative investment rules over a given time horizon, and under any admissible configuration of model parameters that might materialize.
Abstract: In an equity market model with "Knightian" uncertainty regarding the relative risk and covariance structure of its assets, we characterize in several ways the highest return relative to the market that can be achieved using nonanticipative investment rules over a given time horizon, and under any admissible configuration of model parameters that might materialize. One characterization is in terms of the smallest positive supersolution to a fully nonlinear parabolic partial differential equation of the Hamilton--Jacobi--Bellman type. Under appropriate conditions, this smallest supersolution is the value function of an associated stochastic control problem, namely, the maximal probability with which an auxiliary multidimensional diffusion process, controlled in a manner which affects both its drift and covariance structures, stays in the interior of the positive orthant through the end of the time-horizon. This value function is also characterized in terms of a stochastic game, and can be used to generate an investment rule that realizes such best possible outperformance of the market.

44 citations


Network Information
Related Topics (5)
Optimal control
68K papers, 1.2M citations
87% related
Bounded function
77.2K papers, 1.3M citations
85% related
Markov chain
51.9K papers, 1.3M citations
85% related
Linear system
59.5K papers, 1.4M citations
84% related
Optimization problem
96.4K papers, 2.1M citations
83% related
Performance
Metrics
No. of papers in the topic in previous years
YearPapers
2023261
2022537
2021369
2020411
2019348
2018353