Topic

Bellman equation

About: Bellman equation is a research topic. Over the lifetime, 5884 publications have been published within this topic receiving 135589 citations.

...read moreread less

Papers published on a yearly basis

1 / 2

Papers

PDF

Open Access

More filters

Journal Article•DOI•

Multitime scale Markov decision processes

[...]

Hyeong Soo Chang¹, P.J. Fard², S.I. Marcus², Mark A. Shayman²•Institutions (2)

Sogang University¹, University of Maryland, College Park²

20 Jun 2003-IEEE Transactions on Automatic Control

TL;DR: This paper proposes a simple analytical model called M time scale Markov decision process (MMDPs) for hierarchically structured sequential decision making processes, where decisions in each level in the M-level hierarchy are made in M different discrete time scales.

...read moreread less

Abstract: This paper proposes a simple analytical model called M time scale Markov decision process (MMDPs) for hierarchically structured sequential decision making processes, where decisions in each level in the M-level hierarchy are made in M different discrete time scales. In this model, the state-space and the control-space of each level in the hierarchy are nonoverlapping with those of the other levels, respectively, and the hierarchy is structured in a "pyramid" sense such that a decision made at level m (slower time scale) state and/or the state will affect the evolutionary decision making process of the lower level m+1 (faster time scale) until a new decision is made at the higher level but the lower level decisions themselves do not affect the transition dynamics of higher levels. The performance produced by the lower level decisions will affect the higher level decisions. A hierarchical objective function is defined such that the finite-horizon value of following a (nonstationary) policy at level m+1 over a decision epoch of level m plus an immediate reward at level m is the single-step reward for the decision making process at level m. From this we define "multi-level optimal value function" and derive "multi-level optimality equation." We discuss how to solve MMDPs exactly and study some approximation methods, along with heuristic sampling-based schemes, to solve MMDPs.

...read moreread less

76 citations

Book Chapter•DOI•

Bellman function in stochastic control and harmonic analysis

[...]

Fedor Nazarov¹, Sergei Treil², Alexander Volberg¹•Institutions (2)

Michigan State University¹, Brown University²

01 Jan 2001

TL;DR: This work shows how the homonym function in harmonic analysis is (and how it is not) the same stochastic optimal control Bellman function, and presents several creatures from Bellman’s Zoo.

...read moreread less

Abstract: The stochastic optimal control uses the differential equation of Bell-man and its solution—the Bellman function. We show how the homonym function in harmonic analysis is (and how it is not) the same stochastic optimal control Bellman function. Then we present several creatures from Bellman’s Zoo: a function that proves the inverse Holder inequality, as well as several other harmonic analysis Bellman functions and their corresponding Bellman PDE’s. Finally we translate the approach of Burkholder to the language of “our” Bellman function.

...read moreread less

75 citations

Journal Article•DOI•

A Multiperiod Newsvendor Problem with Partially Observed Demand

[...]

Alain Bensoussan¹, Metin Çakanyildirim¹, Suresh Sethi¹•Institutions (1)

University of Texas at Dallas¹

01 May 2007-Mathematics of Operations Research

TL;DR: In this article, the authors considered a news vendor problem with partially observed Markovian demand and showed that the value function is piecewise linear and provided a near-optimal solution.

...read moreread less

Abstract: We consider a newsvendor problem with partially observed Markovian demand. Demand is observed if it is less than the inventory. Otherwise, only the event that it is larger than or equal to the inventory is observed. These observations are used to update the demand distribution from one period to the next. The state of the resulting dynamic programming equation is the current demand distribution, which is generally infinite dimensional. We use unnormalized probabilities to convert the nonlinear state transition equation to a linear one. This helps in proving the existence of an optimal feedback ordering policy. So as to learn more about the demand, the optimal order is set to exceed the myopic optimal order. The optimal cost decreases as the demand distribution decreases in the hazard rate order. In a special case with finitely many demand values, we characterize a near-optimal solution by establishing that the value function is piecewise linear.

...read moreread less

75 citations

Proceedings Article•

Batch Value Function Approximation via Support Vectors

[...]

Thomas G. Dietterich¹, Xin Wang¹•Institutions (1)

Oregon State University¹

03 Jan 2001

TL;DR: Three ways of combining linear programming with the kernel trick to find value function approximations for reinforcement learning are presented, one based on SVM regression; the second is based on the Bellman equation; and the third seeks only to ensure that good moves have an advantage over bad moves.

...read moreread less

Abstract: We present three ways of combining linear programming with the kernel trick to find value function approximations for reinforcement learning. One formulation is based on SVM regression; the second is based on the Bellman equation; and the third seeks only to ensure that good moves have an advantage over bad moves. All formulations attempt to minimize the number of support vectors while fitting the data. Experiments in a difficult, synthetic maze problem show that all three formulations give excellent performance, but the advantage formulation is much easier to train. Unlike policy gradient methods, the kernel methods described here can easily adjust the complexity of the function approximator to fit the complexity of the value function.

...read moreread less

75 citations

Proceedings Article•

Symbolic dynamic programming for first-order POMDPs

[...]

Scott Sanner¹, Kristian Kersting•Institutions (1)

NICTA¹

11 Jul 2010

TL;DR: This work shows that it is also possible to exploit the full expressive power of first-order quantification to achieve state, action, and observation abstraction in a dynamic programming solution to relationally specified POMDPs.

...read moreread less

Abstract: Partially-observable Markov decision processes (POMDPs) provide a powerful model for sequential decision-making problems with partially-observed state and are known to have (approximately) optimal dynamic programming solutions. Much work in recent years has focused on improving the efficiency of these dynamic programming algorithms by exploiting symmetries and factored or relational representations. In this work, we show that it is also possible to exploit the full expressive power of first-order quantification to achieve state, action, and observation abstraction in a dynamic programming solution to relationally specified POMDPs. Among the advantages of this approach are the ability to maintain compact value function representations, abstract over the space of potentially optimal actions, and automatically derive compact conditional policy trees that minimally partition relational observation spaces according to distinctions that have an impact on policy values. This is the first lifted relational POMDP solution that can optimally accommodate actions with a potentially infinite relational space of observation outcomes.

...read moreread less

75 citations

Collapse

Network Information

Performance

Metrics

6,698

Papers

155,793

Citations

No. of papers in the topic in previous years
Year	Papers
2023	261
2022	537
2021	369
2020	411
2019	348
2018	353

Bellman equation

Papers published on a yearly basis

Papers

Trending Questions (10)

Network Information

Related Topics (5)

Performance

Metrics