Open AccessPosted Content
Inter-Level Cooperation in Hierarchical Reinforcement Learning
Abdul Rahman Kreidieh,Samyak Parajuli,Nathan Lichtle,Yiling You,Rayyan Nasr,Alexandre M. Bayen +5 more
Reads0
Chats0
TLDR
It is hypothesized that improved cooperation between the internal agents of a hierarchy can simplify the credit assignment problem from the perspective of the high-level policies, thereby leading to significant improvements to training in situations where intricate sets of action primitives must be performed to yield improvements in performance.Abstract:
Hierarchical models for deep reinforcement learning (RL) have emerged as powerful methods for generating meaningful control strategies in difficult long time horizon tasks. Training of said hierarchical models, however, continue to suffer from instabilities that limit their applicability. In this paper, we address instabilities that arise from the concurrent optimization of goal-assignment and goal-achievement policies. Drawing connections between this concurrent optimization scheme and communication and cooperation in multi-agent RL, we redefine the standard optimization procedure to explicitly promote cooperation between these disparate tasks. Our method is demonstrated to achieve superior results to existing techniques in a set of difficult long time horizon tasks, and serves to expand the scope of solvable tasks by hierarchical reinforcement learning. Videos of the results are available at: this https URL.read more
Citations
More filters
Proceedings ArticleDOI
Demonstration-Bootstrapped Autonomous Practicing via Multi-Task Reinforcement Learning
TL;DR: This work proposes a system for reinforcement learning that leverages multi-task reinforcement learning bootstrapped with prior data to enable continuous autonomous practicing, minimizing the number of resets needed while being able to learn temporally extended behaviors.
Journal ArticleDOI
Learning Pneumatic Non-Prehensile Manipulation With a Mobile Blower
TL;DR: This work investigates pneumatic non-prehensile manipulation as a means of efficiently moving scattered objects into a target receptacle by introducing a multi-frequency version of the spatial action maps framework and shows that its simulation-trained policies transfer well to a real environment and can generalize to novel objects.
Journal Article
HiSaRL: A Hierarchical Framework for Safe Reinforcement Learning
TL;DR: A two-level hierarchical framework for safe reinforcement learning in a complex environment that contains a learning-based controller and its corresponding neural Lyapunov function, which characterizes the controller’s stability property.
Posted Content
Active Hierarchical Imitation and Reinforcement Learning.
TL;DR: The experimental results showed that using DAgger and reward-based active learning method can achieve better performance while saving more human efforts physically and mentally during the training process.
Proceedings ArticleDOI
Learning in Bi-level Markov Games
TL;DR: Experimental results show that the proposed framework of the bi-level Markov game (BMG), which breaks the simultaneity of multi-agent reinforcement learning, achieves competitive advantages in terms of better performance and lower variance.
References
More filters
Journal ArticleDOI
Human-level control through deep reinforcement learning
Volodymyr Mnih,Koray Kavukcuoglu,David Silver,Andrei Rusu,Joel Veness,Marc G. Bellemare,Alex Graves,Martin Riedmiller,Andreas K. Fidjeland,Georg Ostrovski,Stig Petersen,Charles Beattie,Amir Sadik,Ioannis Antonoglou,Helen King,Dharshan Kumaran,Daan Wierstra,Shane Legg,Demis Hassabis +18 more
TL;DR: This work bridges the divide between high-dimensional sensory inputs and actions, resulting in the first artificial agent that is capable of learning to excel at a diverse array of challenging tasks.
Posted Content
Proximal Policy Optimization Algorithms
TL;DR: A new family of policy gradient methods for reinforcement learning, which alternate between sampling data through interaction with the environment, and optimizing a "surrogate" objective function using stochastic gradient ascent, are proposed.
Journal ArticleDOI
Technical Note : \cal Q -Learning
Chris Watkins,Peter Dayan +1 more
TL;DR: This paper presents and proves in detail a convergence theorem forQ-learning based on that outlined in Watkins (1989), showing that Q-learning converges to the optimum action-values with probability 1 so long as all actions are repeatedly sampled in all states and the action- values are represented discretely.
Journal ArticleDOI
Simple Statistical Gradient-Following Algorithms for Connectionist Reinforcement Learning
TL;DR: This article presents a general class of associative reinforcement learning algorithms for connectionist networks containing stochastic units that are shown to make weight adjustments in a direction that lies along the gradient of expected reinforcement in both immediate-reinforcement tasks and certain limited forms of delayed-reInforcement tasks, and they do this without explicitly computing gradient estimates.
Proceedings Article
Policy Gradient Methods for Reinforcement Learning with Function Approximation
TL;DR: This paper proves for the first time that a version of policy iteration with arbitrary differentiable function approximation is convergent to a locally optimal policy.