Topic

Bellman equation

About: Bellman equation is a research topic. Over the lifetime, 5884 publications have been published within this topic receiving 135589 citations.

...read moreread less

Papers published on a yearly basis

1 / 2

Papers

PDF

Open Access

More filters

Journal Article•DOI•

A test of the principle of optimality

[...]

Enrica Carbone¹, John D. Hey¹•Institutions (1)

University of York¹

01 May 2001-Theory and Decision

TL;DR: In this paper, an experimental test of the Principle of Optimality in dynamic decision problems is presented, where the decision-maker should always choose the optimal decision at each stage of the decision problem, conditional on behaving optimally thereafter.

...read moreread less

Abstract: This paper reports on an experimental test of the Principle of Optimality in dynamic decision problems. This Principle, which states that the decision-maker should always choose the optimal decision at each stage of the decision problem, conditional on behaving optimally thereafter, underlies many theories of optimal dynamic decision making, but is normally difficult to test empirically without knowledge of the decision-maker's preference function. In the experiment reported here we use a new experimental procedure to get round this difficulty, which also enables us to shed some light on the decision process that the decision-maker is using if he or she is not using the Principle of Optimality - which appears to be the case in our experiments.

...read moreread less

40 citations

Journal Article•DOI•

On the Hamilton-Jacobi-Bellman equations in Banach spaces

[...]

H. Mete Soner¹•Institutions (1)

Carnegie Mellon University¹

01 Jun 1988-Journal of Optimization Theory and Applications

TL;DR: In this paper, the value function of distributed parameter control problems is shown to be the unique viscosity solution of the corresponding Hamiltonian-Jacobi-Bellman equation, and the main assumption is the existence of an increasing sequence of compact invariant subsets of the state space.

...read moreread less

Abstract: This paper is concerned with a certain class of distributed parameter control problems. The value function of these problems is shown to be the unique viscosity solution of the corresponding Hamiltonian-Jacobi-Bellman equation. The main assumption is the existence of an increasing sequence of compact invariant subsets of the state space. In particular, this assumption is satisfied by a class of controlled delay equations.

...read moreread less

40 citations

Proceedings Article•DOI•

LVIS: Learning from Value Function Intervals for Contact-Aware Robot Controllers

[...]

Robin Deits¹, Twan Koolen¹, Russ Tedrake¹•Institutions (1)

Massachusetts Institute of Technology¹

01 May 2019

TL;DR: LVIS is introduced, which circumvents the issue of local minima through global mixed-integer optimization and theissue of non-uniqueness through learning the optimal value function rather than the optimal policy, and is applied to a fundamentally hard problem in feedback control–control through contact.

...read moreread less

Abstract: Guided policy search is a popular approach for training controllers for high-dimensional systems, but it has a number of pitfalls. Non-convex trajectory optimization has local minima, and non-uniqueness in the optimal policy itself can mean that independently-optimized samples do not describe a coherent policy from which to train. We introduce LVIS, which circumvents the issue of local minima through global mixed-integer optimization and the issue of non-uniqueness through learning the optimal value function rather than the optimal policy. To avoid the expense of solving the mixed-integer programs to full global optimality, we instead solve them only partially, extracting intervals containing the true cost-to-go from early termination of the branch-and-bound algorithm. These interval samples are used to weakly supervise the training of a neural net which approximates the true cost-to-go. Online, we use that learned cost-to-go as the terminal cost of a one-step model-predictive controller, which we solve via a small mixed-integer optimization. We demonstrate LVIS on piecewise affine models of a cart-pole system with walls and a planar humanoid robot and show that it can be applied to a fundamentally hard problem in feedback control–control through contact.

...read moreread less

40 citations

Journal Article•DOI•

Queue-Aware Dynamic Clustering and Power Allocation for Network MIMO Systems via Distributed Stochastic Learning

[...]

Ying Cui¹, Qingqing Huang¹, Vincent K. N. Lau¹•Institutions (1)

Hong Kong University of Science and Technology¹

01 Mar 2011-IEEE Transactions on Signal Processing

TL;DR: The proposed QSI-aware simultaneous iterative water-filling algorithm (QSIWFA) is shown to achieve the Nash equilibrium (NE) and it is shown that the proposed distributed online learning algorithm converges almost surely.

...read moreread less

Abstract: In this paper, we propose a two-timescale delay-optimal dynamic clustering and power allocation design for downlink network MIMO systems. The dynamic clustering control is adaptive to the global queue state information (GQSI) only and computed at the base station controller (BSC) over a longer time scale. On the other hand, the power allocations of all the BSs in each cluster are adaptive to both intracluster channel state information (CCSI) and intracluster queue state information (CQSI), and computed at each cluster manager (CM) over a shorter time scale. We show that the two-timescale delay-optimal control can be formulated as an infinite-horizon average cost constrained partially observed Markov decision process (CPOMDP). By exploiting the special problem structure, we derive an equivalent Bellman equation in terms of pattern selection Q-factor to solve the CPOMDP. To address the distributed requirement and computational complexity, we approximate the pattern selection Q-factor by the sum of per-cluster potential functions and propose a novel distributed online learning algorithm to estimate them distributedly. We show that the proposed distributed online learning algorithm converges almost surely. By exploiting the birth-death structure of the queue dynamics, we further decompose the per-cluster potential function into the sum of per-cluster per-user potential functions and formulate the instantaneous power allocation as a per-stage QSI-aware interference game played among all the CMs. The proposed QSI-aware simultaneous iterative water-filling algorithm (QSIWFA) is shown to achieve the Nash equilibrium (NE).

...read moreread less

40 citations

Journal Article•DOI•

Dynamic programming and Lagrange multipliers for active relaxation of resources in nonlinear non-equilibrium systems

[...]

Stanislaw Sieniutycz¹•Institutions (1)

Warsaw University of Technology¹

01 Mar 2009-Applied Mathematical Modelling

TL;DR: In this paper various mathematical tools are applied in dynamic optimization of power-maximizing paths, with special attention paid to nonlinear systems, and convergence of discrete algorithms to viscosity solutions of HJB equations, discrete approximations and the role of Lagrange multiplier λ associated with the duration constraint is considered.

...read moreread less

40 citations

Collapse

Network Information

Performance

Metrics

6,698

Papers

155,793

Citations

No. of papers in the topic in previous years
Year	Papers
2023	261
2022	537
2021	369
2020	411
2019	348
2018	353

Bellman equation

Papers published on a yearly basis

Papers

Trending Questions (10)

Network Information

Related Topics (5)

Performance

Metrics