Policy Derivation Methods for Critic-Only Reinforcement Learning in Continuous Action Spaces

doi:10.1016/J.IFACOL.2016.07.127

Open AccessJournal ArticleDOI

Policy Derivation Methods for Critic-Only Reinforcement Learning in Continuous Action Spaces

Eduard Alibekov, +3 more

- 01 Jan 2016 -

IFAC-PapersOnLine

- Vol. 49, Iss: 5, pp 285-290

Chats0

TLDR

Several variants of the policy-derivation algorithm are introduced and compared on two continuous state-action benchmarks: double pendulum swing-up and 3D mountain car.

About:

This article is published in IFAC-PapersOnLine.The article was published on 2016-01-01 and is currently open access. It has received 8 citations till now. The article focuses on the topics: Q-learning & Reinforcement learning.

Citations

PDF

Open Access

More filters

Journal ArticleDOI

Optimal and Autonomous Control Using Reinforcement Learning: A Survey

Bahare Kiumarsi, +3 more

- 01 Jun 2018 -

IEEE Transactions on Neural Networks

TL;DR: Q-learning and the integral RL algorithm as core algorithms for discrete time (DT) and continuous-time (CT) systems, respectively are discussed, and a new direction of off-policy RL for both CT and DT systems is discussed.

...read moreread less

Journal ArticleDOI

Experience selection in deep reinforcement learning for control

Tim de Bruin, +3 more

- 01 Jan 2018 -

Journal of Machine Learning Research

TL;DR: This work proposes guidelines for using prior knowledge about the characteristics of the control problem at hand to choose the appropriate experience replay strategy, and investigates different proxies for their immediate and long-term utility.

...read moreread less

Proceedings ArticleDOI

Symbolic method for deriving policy in reinforcement learning

Eduard Alibekov, +2 more

TL;DR: A novel method based on genetic programming is proposed to construct a symbolic function, which serves as a proxy to the value function and from which a continuous policy is derived, which outperforms the standard policy derivation method.

...read moreread less

Journal ArticleDOI

Knowledge-based reinforcement learning controller with fuzzy-rule network: experimental validation

Chidentree Treesatayapun

- 01 Jul 2020 -

Neural Computing and Applications

TL;DR: A model-free controller for a general class of output feedback nonlinear discrete-time systems is established by action-critic networks and reinforcement learning with human knowledge based on IF–THEN rules.

...read moreread less

Journal ArticleDOI

Policy derivation methods for critic-only reinforcement learning in continuous spaces

Eduard Alibekov, +3 more

- 01 Mar 2018 -

Engineering Applications of Artificial I...

TL;DR: Policy derivation methods which alleviate the above problems by means of action space refinement, continuous approximation, and post-processing of the V-function by using symbolic regression are proposed.

...read moreread less

References

PDF

Open Access

More filters

Journal ArticleDOI

Experiments with reinforcement learning in problems with continuous state and action spaces

Juan Carlos Santamaria, +2 more

- 01 Mar 1998 -

Adaptive Behavior

TL;DR: This article proposes a simple and modular technique that can be used to implement function approximators with nonuniform degrees of resolution so that the value function can be represented with higher accuracy in important regions of the state and action spaces.

...read moreread less

Journal ArticleDOI

Reinforcement Learning with Factored States and Actions

Brian Sallans, +1 more

- 01 Dec 2004 -

Journal of Machine Learning Research

TL;DR: A novel approximation method is presented for approximating the value function and selecting good actions for Markov decision processes with large state and action spaces and shows that the product of experts approximation can be used to solve large problems.

...read moreread less

Journal ArticleDOI

Detection and Diagnosis of Oscillation in Control Loops

Nina F. Thornhill, +1 more

- 01 Oct 1997 -

Control Engineering Practice

TL;DR: Further opportunities for oscillation detection in the off-line analysis of ensembles of data from control loops are examined and operational signatures that indicate the cause of an oscillation are presented.

...read moreread less

Proceedings ArticleDOI

Autonomous transfer for reinforcement learning

Matthew D. Taylor, +2 more

TL;DR: This paper introduces Modeling Approximate State Transitions by Exploiting Regression (MASTER), a method for automatically learning a mapping from one task to another through an agent's experience and demonstrates that such learned relationships can significantly improve the speed of a reinforcement learning algorithm in a series of Mountain Car tasks.

...read moreread less

Journal ArticleDOI

Approximate dynamic programming with a fuzzy parameterization

Lucian Busoniu, +3 more

- 01 May 2010 -

Automatica

TL;DR: This work shows that fuzzy Q-iteration is consistent, i.e., that it asymptotically obtains the optimal solution as the approximation accuracy increases, and proves that the asynchronous algorithm is proven to converge at least as fast as the synchronous one.

...read moreread less

Policy Derivation Methods for Critic-Only Reinforcement Learning in Continuous Action Spaces

Citations

Optimal and Autonomous Control Using Reinforcement Learning: A Survey

Experience selection in deep reinforcement learning for control

Symbolic method for deriving policy in reinforcement learning

Knowledge-based reinforcement learning controller with fuzzy-rule network: experimental validation

Policy derivation methods for critic-only reinforcement learning in continuous spaces

References

Experiments with reinforcement learning in problems with continuous state and action spaces

Reinforcement Learning with Factored States and Actions

Detection and Diagnosis of Oscillation in Control Loops

Autonomous transfer for reinforcement learning

Approximate dynamic programming with a fuzzy parameterization

Related Papers (5)

Learning continuous-action control policies

Binary action search for learning continuous-action control policies

Reinforcement learning in multidimensional continuous action spaces

Reinforcement Learning in Continuous Action Spaces through Sequential Monte Carlo Methods

Fitted Natural Actor-Critic: A New Algorithm for Continuous State-Action MDPs