scispace - formally typeset
Search or ask a question
Topic

Bellman equation

About: Bellman equation is a research topic. Over the lifetime, 5884 publications have been published within this topic receiving 135589 citations.


Papers
More filters
Journal ArticleDOI
TL;DR: It is demonstrated how direct reinforcement can be used to optimize risk-adjusted investment returns (including the differential Sharpe ratio), while accounting for the effects of transaction costs.
Abstract: We present methods for optimizing portfolios, asset allocations, and trading systems based on direct reinforcement (DR). In this approach, investment decision-making is viewed as a stochastic control problem, and strategies are discovered directly. We present an adaptive algorithm called recurrent reinforcement learning (RRL) for discovering investment policies. The need to build forecasting models is eliminated, and better trading performance is obtained. The direct reinforcement approach differs from dynamic programming and reinforcement algorithms such as TD-learning and Q-learning, which attempt to estimate a value function for the control problem. We find that the RRL direct reinforcement framework enables a simpler problem representation, avoids Bellman's curse of dimensionality and offers compelling advantages in efficiency. We demonstrate how direct reinforcement can be used to optimize risk-adjusted investment returns (including the differential Sharpe ratio), while accounting for the effects of transaction costs. In extensive simulation work using real financial data, we find that our approach based on RRL produces better trading strategies than systems utilizing Q-learning (a value function method). Real-world applications include an intra-daily currency trader and a monthly asset allocation system for the S&P 500 Stock Index and T-Bills.

396 citations

Proceedings Article
Nicolas Heess1, Greg Wayne1, David Silver1, Timothy P. Lillicrap1, Yuval Tassa1, Tom Erez1 
07 Dec 2015
TL;DR: In this article, a unified framework for learning continuous control policies using backpropagation is presented, which is a spectrum of general policy gradient algorithms that range from model-free methods with value functions to model-based methods without value functions.
Abstract: We present a unified framework for learning continuous control policies using backpropagation. It supports stochastic control by treating stochasticity in the Bellman equation as a deterministic function of exogenous noise. The product is a spectrum of general policy gradient algorithms that range from model-free methods with value functions to model-based methods without value functions. We use learned models but only require observations from the environment instead of observations from model-predicted trajectories, minimizing the impact of compounded model errors. We apply these algorithms first to a toy stochastic control problem and then to several physics-based control problems in simulation. One of these variants, SVG(1), shows the effectiveness of learning models, value functions, and policies simultaneously in continuous domains.

387 citations

Journal ArticleDOI
TL;DR: In this paper, a linear superposition of M basis functions is proposed to fit the value function in a Markovian decision process by reducing the problem dimensionality from the number of states down to M.

385 citations

Journal ArticleDOI
TL;DR: The aim of the paper is to give basic theoretical results on the structure of the optimal state-feedback solution and of the value function and to describe how the state- feedback optimal control law can be constructed by combining multiparametric programming and dynamic programming.

372 citations

Journal ArticleDOI
TL;DR: For the first time, GFHMs are used to approximate the solutions (value functions) of the coupled HJ equations, based on policy iteration algorithm, and the approximation solution is utilized to obtain the optimal coordination control.
Abstract: In this paper, a new online scheme is presented to design the optimal coordination control for the consensus problem of multiagent differential games by fuzzy adaptive dynamic programming, which brings together game theory, generalized fuzzy hyperbolic model (GFHM), and adaptive dynamic programming. In general, the optimal coordination control for multiagent differential games is the solution of the coupled Hamilton-Jacobi (HJ) equations. Here, for the first time, GFHMs are used to approximate the solutions (value functions) of the coupled HJ equations, based on policy iteration algorithm. Namely, for each agent, GFHM is used to capture the mapping between the local consensus error and local value function. Since our scheme uses the single-network architecture for each agent (which eliminates the action network model compared with dual-network architecture), it is a more reasonable architecture for multiagent systems. Furthermore, the approximation solution is utilized to obtain the optimal coordination control. Finally, we give the stability analysis for our scheme, and prove the weight estimation error and the local consensus error are uniformly ultimately bounded. Further, the control node trajectory is proven to be cooperative uniformly ultimately bounded.

371 citations


Network Information
Related Topics (5)
Optimal control
68K papers, 1.2M citations
87% related
Bounded function
77.2K papers, 1.3M citations
85% related
Markov chain
51.9K papers, 1.3M citations
85% related
Linear system
59.5K papers, 1.4M citations
84% related
Optimization problem
96.4K papers, 2.1M citations
83% related
Performance
Metrics
No. of papers in the topic in previous years
YearPapers
2023261
2022537
2021369
2020411
2019348
2018353