scispace - formally typeset
Search or ask a question
Topic

Bellman equation

About: Bellman equation is a research topic. Over the lifetime, 5884 publications have been published within this topic receiving 135589 citations.


Papers
More filters
Proceedings Article
03 Jul 2018
TL;DR: The authors reformulate the Bellman optimality equation into a primal-dual optimization problem using Nesterov smoothing technique and the Legendre-Fenchel transformation, and develop a new algorithm, called Smoothed Bellman Error Embedding, to solve this optimization problem.
Abstract: When function approximation is used, solving the Bellman optimality equation with stability guarantees has remained a major open problem in reinforcement learning for decades. The fundamental difficulty is that the Bellman operator may become an expansion in general, resulting in oscillating and even divergent behavior of popular algorithms like Q-learning. In this paper, we revisit the Bellman equation, and reformulate it into a novel primal-dual optimization problem using Nesterov’s smoothing technique and the Legendre-Fenchel transformation. We then develop a new algorithm, called Smoothed Bellman Error Embedding, to solve this optimization problem where any differentiable function class may be used. We provide what we believe to be the first convergence guarantee for general nonlinear function approximation, and analyze the algorithm’s sample complexity. Empirically, our algorithm compares favorably to state-of-the-art baselines in several benchmark control problems.

224 citations

Journal ArticleDOI
TL;DR: This work develops a column generation algorithm to solve the problem for a multinomial logit choice model with disjoint consideration sets (MNLD), and derives a bound as a by-product of a decomposition heuristic.
Abstract: We consider a network revenue management problem where customers choose among open fare products according to some prespecified choice model. Starting with a Markov decision process (MDP) formulation, we approximate the value function with an affine function of the state vector. We show that the resulting problem provides a tighter bound for the MDP value than the choice-based linear program. We develop a column generation algorithm to solve the problem for a multinomial logit choice model with disjoint consideration sets (MNLD). We also derive a bound as a by-product of a decomposition heuristic. Our numerical study shows the policies from our solution approach can significantly outperform heuristics from the choice-based linear program.

223 citations

Journal ArticleDOI
TL;DR: This article introduces Gaussian process dynamic programming (GPDP), an approximate value function-based RL algorithm, and proposes to learn probabilistic models of the a priori unknown transition dynamics and the value functions on the fly.

222 citations

Journal ArticleDOI
TL;DR: In this article, the authors provide conditions on the primitives of a continuous-time economy under which there exist equilibria obeying the Consumption-Based Capital Asset Pricing Model (CCAPM).
Abstract: The paper provides conditions on the primitives of a continuous-time economy under which there exist equilibria obeying the Consumption-Based Capital Asset Pricing Model (CCAPM). The paper also extends the equilibrium characterization of interest rates of Cox, Ingersoll, and Ross (1985) to multi-agent economies. We do not use a Markovian state assumption. THIS WORK PROVIDES sufficient conditions on agents' primitives for the validity of the Consumption-Based Capital Asset Pricing Model (CCAPM) of Breeden (1979). As a necessary condition, Breeden showed that in a continuous-time equilibrium satisfying certain regularity conditions, one can characterize returns on securities as follows. The expected "instantaneous" rate of return on any security in excess of the riskless interest rate (the security's expected excess rate of return) is a multiple common to all securities of the "instantaneous covariance" of this excess return with aggregate consumption increments. This common multiple is the Arrow-Pratt measure of risk aversion of a representative agent. (Rubinstein (1976) published a discrete-time precursor of this result.) The exis- tence of equilibria satisfying Breeden's regularity conditions had been an open issue. We also show that the validity of the CCAPM does not depend on Breeden's assumption of Markov state information, and present a general asset pricing model extending the results of Cox, Ingersoll, and Ross (1985) as well as the discrete-time results of Rubinstein (1976) and Lucas (1978) to a multi-agent environment. Since the CCAPM was first proposed, much effort has been directed at finding sufficient conditions on the model primitives: the given assets, the agents' preferences, the agents' consumption endowments, and (in a production econ- omy) the feasible production sets. Conditions sufficient for the existence of continuous-time equilibria were shown in Duffie (1986), but the equilibria demonstrated were not shown to satisfy the additional regularity required for the CCAPM. In particular, Breeden assumed that all agents choose pointwise interior consumption rates, in order to characterize asset prices via the first order conditions of the Bellman equation. Interiority was also assumed by Huang (1987) in demonstrating a representative agent characterization of equilibrium, an approach exploited here. The use of dynamic programming and the Bellman equation, aside from the difficulty it imposes in verifying the existence of interior 1 Financial support from the National Science Foundation is gratefully acknowledged. We thank

215 citations


Network Information
Related Topics (5)
Optimal control
68K papers, 1.2M citations
87% related
Bounded function
77.2K papers, 1.3M citations
85% related
Markov chain
51.9K papers, 1.3M citations
85% related
Linear system
59.5K papers, 1.4M citations
84% related
Optimization problem
96.4K papers, 2.1M citations
83% related
Performance
Metrics
No. of papers in the topic in previous years
YearPapers
2023261
2022537
2021369
2020411
2019348
2018353