Topic
Bellman equation
About: Bellman equation is a research topic. Over the lifetime, 5884 publications have been published within this topic receiving 135589 citations.
Papers published on a yearly basis
Papers
More filters
••
TL;DR: In this paper, an experimental test of the Principle of Optimality in dynamic decision problems is presented, where the decision-maker should always choose the optimal decision at each stage of the decision problem, conditional on behaving optimally thereafter.
Abstract: This paper reports on an experimental test of the Principle of Optimality in dynamic decision problems. This Principle, which states that the decision-maker should always choose the optimal decision at each stage of the decision problem, conditional on behaving optimally thereafter, underlies many theories of optimal dynamic decision making, but is normally difficult to test empirically without knowledge of the decision-maker's preference function. In the experiment reported here we use a new experimental procedure to get round this difficulty, which also enables us to shed some light on the decision process that the decision-maker is using if he or she is not using the Principle of Optimality - which appears to be the case in our experiments.
40 citations
••
TL;DR: In this paper, the value function of distributed parameter control problems is shown to be the unique viscosity solution of the corresponding Hamiltonian-Jacobi-Bellman equation, and the main assumption is the existence of an increasing sequence of compact invariant subsets of the state space.
Abstract: This paper is concerned with a certain class of distributed parameter control problems. The value function of these problems is shown to be the unique viscosity solution of the corresponding Hamiltonian-Jacobi-Bellman equation. The main assumption is the existence of an increasing sequence of compact invariant subsets of the state space. In particular, this assumption is satisfied by a class of controlled delay equations.
40 citations
••
01 May 2019TL;DR: LVIS is introduced, which circumvents the issue of local minima through global mixed-integer optimization and theissue of non-uniqueness through learning the optimal value function rather than the optimal policy, and is applied to a fundamentally hard problem in feedback control–control through contact.
Abstract: Guided policy search is a popular approach for training controllers for high-dimensional systems, but it has a number of pitfalls. Non-convex trajectory optimization has local minima, and non-uniqueness in the optimal policy itself can mean that independently-optimized samples do not describe a coherent policy from which to train. We introduce LVIS, which circumvents the issue of local minima through global mixed-integer optimization and the issue of non-uniqueness through learning the optimal value function rather than the optimal policy. To avoid the expense of solving the mixed-integer programs to full global optimality, we instead solve them only partially, extracting intervals containing the true cost-to-go from early termination of the branch-and-bound algorithm. These interval samples are used to weakly supervise the training of a neural net which approximates the true cost-to-go. Online, we use that learned cost-to-go as the terminal cost of a one-step model-predictive controller, which we solve via a small mixed-integer optimization. We demonstrate LVIS on piecewise affine models of a cart-pole system with walls and a planar humanoid robot and show that it can be applied to a fundamentally hard problem in feedback control–control through contact.
40 citations
••
TL;DR: The proposed QSI-aware simultaneous iterative water-filling algorithm (QSIWFA) is shown to achieve the Nash equilibrium (NE) and it is shown that the proposed distributed online learning algorithm converges almost surely.
Abstract: In this paper, we propose a two-timescale delay-optimal dynamic clustering and power allocation design for downlink network MIMO systems. The dynamic clustering control is adaptive to the global queue state information (GQSI) only and computed at the base station controller (BSC) over a longer time scale. On the other hand, the power allocations of all the BSs in each cluster are adaptive to both intracluster channel state information (CCSI) and intracluster queue state information (CQSI), and computed at each cluster manager (CM) over a shorter time scale. We show that the two-timescale delay-optimal control can be formulated as an infinite-horizon average cost constrained partially observed Markov decision process (CPOMDP). By exploiting the special problem structure, we derive an equivalent Bellman equation in terms of pattern selection Q-factor to solve the CPOMDP. To address the distributed requirement and computational complexity, we approximate the pattern selection Q-factor by the sum of per-cluster potential functions and propose a novel distributed online learning algorithm to estimate them distributedly. We show that the proposed distributed online learning algorithm converges almost surely. By exploiting the birth-death structure of the queue dynamics, we further decompose the per-cluster potential function into the sum of per-cluster per-user potential functions and formulate the instantaneous power allocation as a per-stage QSI-aware interference game played among all the CMs. The proposed QSI-aware simultaneous iterative water-filling algorithm (QSIWFA) is shown to achieve the Nash equilibrium (NE).
40 citations
••
TL;DR: In this paper various mathematical tools are applied in dynamic optimization of power-maximizing paths, with special attention paid to nonlinear systems, and convergence of discrete algorithms to viscosity solutions of HJB equations, discrete approximations and the role of Lagrange multiplier λ associated with the duration constraint is considered.
40 citations