scispace - formally typeset
Search or ask a question
Topic

Bellman equation

About: Bellman equation is a research topic. Over the lifetime, 5884 publications have been published within this topic receiving 135589 citations.


Papers
More filters
Journal ArticleDOI
TL;DR: In this paper, an experimental test of the Principle of Optimality in dynamic decision problems is presented, where the decision-maker should always choose the optimal decision at each stage of the decision problem, conditional on behaving optimally thereafter.
Abstract: This paper reports on an experimental test of the Principle of Optimality in dynamic decision problems. This Principle, which states that the decision-maker should always choose the optimal decision at each stage of the decision problem, conditional on behaving optimally thereafter, underlies many theories of optimal dynamic decision making, but is normally difficult to test empirically without knowledge of the decision-maker's preference function. In the experiment reported here we use a new experimental procedure to get round this difficulty, which also enables us to shed some light on the decision process that the decision-maker is using if he or she is not using the Principle of Optimality - which appears to be the case in our experiments.

40 citations

Journal ArticleDOI
TL;DR: In this paper, the value function of distributed parameter control problems is shown to be the unique viscosity solution of the corresponding Hamiltonian-Jacobi-Bellman equation, and the main assumption is the existence of an increasing sequence of compact invariant subsets of the state space.
Abstract: This paper is concerned with a certain class of distributed parameter control problems. The value function of these problems is shown to be the unique viscosity solution of the corresponding Hamiltonian-Jacobi-Bellman equation. The main assumption is the existence of an increasing sequence of compact invariant subsets of the state space. In particular, this assumption is satisfied by a class of controlled delay equations.

40 citations

Proceedings ArticleDOI
01 May 2019
TL;DR: LVIS is introduced, which circumvents the issue of local minima through global mixed-integer optimization and theissue of non-uniqueness through learning the optimal value function rather than the optimal policy, and is applied to a fundamentally hard problem in feedback control–control through contact.
Abstract: Guided policy search is a popular approach for training controllers for high-dimensional systems, but it has a number of pitfalls. Non-convex trajectory optimization has local minima, and non-uniqueness in the optimal policy itself can mean that independently-optimized samples do not describe a coherent policy from which to train. We introduce LVIS, which circumvents the issue of local minima through global mixed-integer optimization and the issue of non-uniqueness through learning the optimal value function rather than the optimal policy. To avoid the expense of solving the mixed-integer programs to full global optimality, we instead solve them only partially, extracting intervals containing the true cost-to-go from early termination of the branch-and-bound algorithm. These interval samples are used to weakly supervise the training of a neural net which approximates the true cost-to-go. Online, we use that learned cost-to-go as the terminal cost of a one-step model-predictive controller, which we solve via a small mixed-integer optimization. We demonstrate LVIS on piecewise affine models of a cart-pole system with walls and a planar humanoid robot and show that it can be applied to a fundamentally hard problem in feedback control–control through contact.

40 citations

Journal ArticleDOI
TL;DR: The proposed QSI-aware simultaneous iterative water-filling algorithm (QSIWFA) is shown to achieve the Nash equilibrium (NE) and it is shown that the proposed distributed online learning algorithm converges almost surely.
Abstract: In this paper, we propose a two-timescale delay-optimal dynamic clustering and power allocation design for downlink network MIMO systems. The dynamic clustering control is adaptive to the global queue state information (GQSI) only and computed at the base station controller (BSC) over a longer time scale. On the other hand, the power allocations of all the BSs in each cluster are adaptive to both intracluster channel state information (CCSI) and intracluster queue state information (CQSI), and computed at each cluster manager (CM) over a shorter time scale. We show that the two-timescale delay-optimal control can be formulated as an infinite-horizon average cost constrained partially observed Markov decision process (CPOMDP). By exploiting the special problem structure, we derive an equivalent Bellman equation in terms of pattern selection Q-factor to solve the CPOMDP. To address the distributed requirement and computational complexity, we approximate the pattern selection Q-factor by the sum of per-cluster potential functions and propose a novel distributed online learning algorithm to estimate them distributedly. We show that the proposed distributed online learning algorithm converges almost surely. By exploiting the birth-death structure of the queue dynamics, we further decompose the per-cluster potential function into the sum of per-cluster per-user potential functions and formulate the instantaneous power allocation as a per-stage QSI-aware interference game played among all the CMs. The proposed QSI-aware simultaneous iterative water-filling algorithm (QSIWFA) is shown to achieve the Nash equilibrium (NE).

40 citations

Journal ArticleDOI
TL;DR: In this paper various mathematical tools are applied in dynamic optimization of power-maximizing paths, with special attention paid to nonlinear systems, and convergence of discrete algorithms to viscosity solutions of HJB equations, discrete approximations and the role of Lagrange multiplier λ associated with the duration constraint is considered.

40 citations


Network Information
Related Topics (5)
Optimal control
68K papers, 1.2M citations
87% related
Bounded function
77.2K papers, 1.3M citations
85% related
Markov chain
51.9K papers, 1.3M citations
85% related
Linear system
59.5K papers, 1.4M citations
84% related
Optimization problem
96.4K papers, 2.1M citations
83% related
Performance
Metrics
No. of papers in the topic in previous years
YearPapers
2023261
2022537
2021369
2020411
2019348
2018353