Inter-Level Cooperation in Hierarchical Reinforcement Learning

Open AccessPosted Content

Inter-Level Cooperation in Hierarchical Reinforcement Learning

Abdul Rahman Kreidieh, +5 more

- 05 Dec 2019 -

arXiv: Learning

Chats0

TLDR

It is hypothesized that improved cooperation between the internal agents of a hierarchy can simplify the credit assignment problem from the perspective of the high-level policies, thereby leading to significant improvements to training in situations where intricate sets of action primitives must be performed to yield improvements in performance.

Abstract:

Hierarchical models for deep reinforcement learning (RL) have emerged as powerful methods for generating meaningful control strategies in difficult long time horizon tasks. Training of said hierarchical models, however, continue to suffer from instabilities that limit their applicability. In this paper, we address instabilities that arise from the concurrent optimization of goal-assignment and goal-achievement policies. Drawing connections between this concurrent optimization scheme and communication and cooperation in multi-agent RL, we redefine the standard optimization procedure to explicitly promote cooperation between these disparate tasks. Our method is demonstrated to achieve superior results to existing techniques in a set of difficult long time horizon tasks, and serves to expand the scope of solvable tasks by hierarchical reinforcement learning. Videos of the results are available at: this https URL.

Citations

PDF

Open Access

More filters

Proceedings ArticleDOI

Demonstration-Bootstrapped Autonomous Practicing via Multi-Task Reinforcement Learning

Abhishek Gupta, +5 more

TL;DR: This work proposes a system for reinforcement learning that leverages multi-task reinforcement learning bootstrapped with prior data to enable continuous autonomous practicing, minimizing the number of resets needed while being able to learn temporally extended behaviors.

...read moreread less

Journal ArticleDOI

Learning Pneumatic Non-Prehensile Manipulation With a Mobile Blower

Jimmy Wu, +5 more

- 05 Apr 2022 -

IEEE robotics and automation letters

TL;DR: This work investigates pneumatic non-prehensile manipulation as a means of efficiently moving scattered objects into a target receptacle by introducing a multi-frequency version of the spatial action maps framework and shows that its simulation-trained policies transfer well to a real environment and can generalize to novel objects.

...read moreread less

Journal Article

HiSaRL: A Hierarchical Framework for Safe Reinforcement Learning

Zikang Xiong, +2 more

TL;DR: A two-level hierarchical framework for safe reinforcement learning in a complex environment that contains a learning-based controller and its corresponding neural Lyapunov function, which characterizes the controller’s stability property.

...read moreread less

Posted Content

Active Hierarchical Imitation and Reinforcement Learning.

Yaru Niu, +1 more

- 14 Dec 2020 -

arXiv: Robotics

TL;DR: The experimental results showed that using DAgger and reward-based active learning method can achieve better performance while saving more human efforts physically and mentally during the training process.

...read moreread less

Proceedings ArticleDOI

Learning in Bi-level Markov Games

Linghui Meng, +3 more

TL;DR: Experimental results show that the proposed framework of the bi-level Markov game (BMG), which breaks the simultaneity of multi-agent reinforcement learning, achieves competitive advantages in terms of better performance and lower variance.

...read moreread less

References

PDF

Open Access

More filters

Posted Content

Proximal Policy Optimization Algorithms

John Schulman, +4 more

- 20 Jul 2017 -

arXiv: Learning

TL;DR: A new family of policy gradient methods for reinforcement learning, which alternate between sampling data through interaction with the environment, and optimizing a "surrogate" objective function using stochastic gradient ascent, are proposed.

...read moreread less

Journal ArticleDOI

Technical Note : \cal Q -Learning

Chris Watkins, +1 more

- 01 May 1992 -

Machine Learning

TL;DR: This paper presents and proves in detail a convergence theorem forQ-learning based on that outlined in Watkins (1989), showing that Q-learning converges to the optimum action-values with probability 1 so long as all actions are repeatedly sampled in all states and the action- values are represented discretely.

...read moreread less

Journal ArticleDOI

Simple Statistical Gradient-Following Algorithms for Connectionist Reinforcement Learning

Ronald J. Williams

- 01 May 1992 -

Machine Learning

TL;DR: This article presents a general class of associative reinforcement learning algorithms for connectionist networks containing stochastic units that are shown to make weight adjustments in a direction that lies along the gradient of expected reinforcement in both immediate-reinforcement tasks and certain limited forms of delayed-reInforcement tasks, and they do this without explicitly computing gradient estimates.

...read moreread less

Proceedings Article

Policy Gradient Methods for Reinforcement Learning with Function Approximation

Richard S. Sutton, +3 more

TL;DR: This paper proves for the first time that a version of policy iteration with arbitrary differentiable function approximation is convergent to a locally optimal policy.

...read moreread less