Home
/
Topics
/
Bellman equation

Topic

Bellman equation

About: Bellman equation is a research topic. Over the lifetime, 5884 publications have been published within this topic receiving 135589 citations.

...read moreread less

Papers published on a yearly basis

2023
2022
2021
2020
2019
2018
2017
2016
2015
2014
2013
2012
2011
2010
2009
2008
2007
2006
2005
2004
2003
2002
2001
2000
1999
1998
1997
1996
1995
1994
1993
1992
1991
1990
1989
1988
1987
1986
1985
1984
1983
1982
1981
1980
1979
1978
1977
1976
1975
1974
1973
1972
1971
1970
1969

1 / 2

Papers

PDF

Open Access

More filters

Optimal control of diffusion processes and Hamilton-Jacobi-Bellman equations, Part I

[...]

P. L. Lions

01 Jan 1983

227 citations

Proceedings Article•

SBEED: Convergent Reinforcement Learning with Nonlinear Function Approximation

[...]

Bo Dai¹, Albert Shaw¹, Lihong Li, Lin Xiao², Niao He³, Niao He⁴, Zhen Liu¹, Jianshu Chen, Le Song¹ - Show less +5 more•Institutions (4)

Georgia Institute of Technology¹, Microsoft², National Center for Supercomputing Applications³, University of Illinois at Urbana–Champaign⁴

03 Jul 2018

TL;DR: The authors reformulate the Bellman optimality equation into a primal-dual optimization problem using Nesterov smoothing technique and the Legendre-Fenchel transformation, and develop a new algorithm, called Smoothed Bellman Error Embedding, to solve this optimization problem.

...read moreread less

Abstract: When function approximation is used, solving the Bellman optimality equation with stability guarantees has remained a major open problem in reinforcement learning for decades. The fundamental difficulty is that the Bellman operator may become an expansion in general, resulting in oscillating and even divergent behavior of popular algorithms like Q-learning. In this paper, we revisit the Bellman equation, and reformulate it into a novel primal-dual optimization problem using Nesterov’s smoothing technique and the Legendre-Fenchel transformation. We then develop a new algorithm, called Smoothed Bellman Error Embedding, to solve this optimization problem where any differentiable function class may be used. We provide what we believe to be the first convergence guarantee for general nonlinear function approximation, and analyze the algorithm’s sample complexity. Empirically, our algorithm compares favorably to state-of-the-art baselines in several benchmark control problems.

...read moreread less

224 citations

Journal Article•DOI•

An Approximate Dynamic Programming Approach to Network Revenue Management with Customer Choice

[...]

Dan Zhang¹, Daniel Adelman²•Institutions (2)

Desautels Faculty of Management¹, University of Chicago²

01 Aug 2009-Transportation Science

TL;DR: This work develops a column generation algorithm to solve the problem for a multinomial logit choice model with disjoint consideration sets (MNLD), and derives a bound as a by-product of a decomposition heuristic.

...read moreread less

Abstract: We consider a network revenue management problem where customers choose among open fare products according to some prespecified choice model. Starting with a Markov decision process (MDP) formulation, we approximate the value function with an affine function of the state vector. We show that the resulting problem provides a tighter bound for the MDP value than the choice-based linear program. We develop a column generation algorithm to solve the problem for a multinomial logit choice model with disjoint consideration sets (MNLD). We also derive a bound as a by-product of a decomposition heuristic. Our numerical study shows the policies from our solution approach can significantly outperform heuristics from the choice-based linear program.

...read moreread less

223 citations

Journal Article•DOI•

Gaussian process dynamic programming

[...]

Marc Peter Deisenroth¹, Carl Edward Rasmussen¹, Jan Peters²•Institutions (2)

University of Cambridge¹, University of Southern California²

01 Mar 2009-Neurocomputing

TL;DR: This article introduces Gaussian process dynamic programming (GPDP), an approximate value function-based RL algorithm, and proposes to learn probabilistic models of the a priori unknown transition dynamics and the value functions on the fly.

...read moreread less

222 citations

Journal Article•DOI•

The consumption-based capital asset pricing model

[...]

Darrell Duffie, William R. Zame

01 Nov 1989-Econometrica

TL;DR: In this article, the authors provide conditions on the primitives of a continuous-time economy under which there exist equilibria obeying the Consumption-Based Capital Asset Pricing Model (CCAPM).

...read moreread less

Abstract: The paper provides conditions on the primitives of a continuous-time economy under which there exist equilibria obeying the Consumption-Based Capital Asset Pricing Model (CCAPM). The paper also extends the equilibrium characterization of interest rates of Cox, Ingersoll, and Ross (1985) to multi-agent economies. We do not use a Markovian state assumption. THIS WORK PROVIDES sufficient conditions on agents' primitives for the validity of the Consumption-Based Capital Asset Pricing Model (CCAPM) of Breeden (1979). As a necessary condition, Breeden showed that in a continuous-time equilibrium satisfying certain regularity conditions, one can characterize returns on securities as follows. The expected "instantaneous" rate of return on any security in excess of the riskless interest rate (the security's expected excess rate of return) is a multiple common to all securities of the "instantaneous covariance" of this excess return with aggregate consumption increments. This common multiple is the Arrow-Pratt measure of risk aversion of a representative agent. (Rubinstein (1976) published a discrete-time precursor of this result.) The exis- tence of equilibria satisfying Breeden's regularity conditions had been an open issue. We also show that the validity of the CCAPM does not depend on Breeden's assumption of Markov state information, and present a general asset pricing model extending the results of Cox, Ingersoll, and Ross (1985) as well as the discrete-time results of Rubinstein (1976) and Lucas (1978) to a multi-agent environment. Since the CCAPM was first proposed, much effort has been directed at finding sufficient conditions on the model primitives: the given assets, the agents' preferences, the agents' consumption endowments, and (in a production econ- omy) the feasible production sets. Conditions sufficient for the existence of continuous-time equilibria were shown in Duffie (1986), but the equilibria demonstrated were not shown to satisfy the additional regularity required for the CCAPM. In particular, Breeden assumed that all agents choose pointwise interior consumption rates, in order to characterize asset prices via the first order conditions of the Bellman equation. Interiority was also assumed by Huang (1987) in demonstrating a representative agent characterization of equilibrium, an approach exploited here. The use of dynamic programming and the Bellman equation, aside from the difficulty it imposes in verifying the existence of interior 1 Financial support from the National Science Foundation is gratefully acknowledged. We thank

...read moreread less

215 citations

1
2
3
4
5
6
7
8
9
10
11
12
13
14
…
15
16
17
18
19
20
21
…
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200

Collapse

Network Information

Performance

Metrics

6,698

Papers

155,793

Citations

No. of papers in the topic in previous years
Year	Papers
2023	261
2022	537
2021	369
2020	411
2019	348
2018	353

Bellman equation

Papers published on a yearly basis

Papers

Trending Questions (10)

Network Information

Related Topics (5)

Performance

Metrics