scispace - formally typeset
Search or ask a question
Proceedings ArticleDOI

Learning Dual Arm Coordinated Reachability Tasks in a Humanoid Robot with Articulated Torso

TL;DR: This paper proposes DiGrad (Differential Gradients), a new RL framework for multi-task learning in manipulators and shows how this framework can be adopted to learn dual arm coordination in a 27 degrees of freedom (DOF) humanoid robot with articulated spine.
Abstract: Performing dual arm coordinated (reachability)tasks in humanoid robots require complex planning strategies and this complexity increases further, in case of humanoids with articulated torso. These complex strategies may not be suitable for online motion planning. This paper proposes a faster way to accomplish dual arm coordinated tasks using methodology based on Reinforcement Learning. The contribution of this paper is twofold. Firstly, we propose DiGrad (Differential Gradients), a new RL framework for multi-task learning in manipulators. Secondly, we show how this framework can be adopted to learn dual arm coordination in a 27 degrees of freedom (DOF)humanoid robot with articulated spine. The proposed framework and methodology are evaluated in various environments and simulation results are presented. A comparative study of DiGrad with its parent algorithm in different settings is also presented.
Citations
More filters
01 Jan 2009
TL;DR: A planning algorithm called BiSpace is presented that produces fast plans to complex high-dimensional problems by simultaneously exploring multiple spaces by using BiSpace's special characteristics to explore the work and configuration spaces of the environment and robot.
Abstract: We present a planning algorithm called BiSpace that produces fast plans to complex high-dimensional problems by simultaneously exploring multiple spaces We specifically focus on finding robust solutions to manipulation and grasp planning problems by using BiSpace's special characteristics to explore the work and configuration spaces of the environment and robot Furthermore, we present a number of techniques for constructing informed heuristics to intelligently search through these highdimensional spaces In general, the BiSpace planner is applicable to any problem involving multiple search spaces

39 citations

Journal ArticleDOI
TL;DR: A contact model and force mapping relationship are established for a robot end effector and surface and a neural network algorithm is used to identify the tangential angle of the unknown curved-surface workpiece.
Abstract: Aiming to solve the problem that the contact force at a robot end effector when tracking an unknown curved-surface workpiece is difficult to keep constant, a robot force control algorithm based on reinforcement learning is proposed. In this paper, a contact model and force mapping relationship are established for a robot end effector and surface. For the problem that the tangential angle of the workpiece surface is difficult to obtain in the mapping relationship, a neural network is used to identify the tangential angle of the unknown curved-surface workpiece. To keep the normal force of the robot end effector constant, a compensation term is added to a traditional explicit force controller to adapt to the robot constant force tracking scenario. For the problem that the compensation term parameters are difficult to select, the reinforcement learning algorithm A2C (advantage actor critic) is used to find the optimal parameters, and the return function and state values are modified in the A2C algorithm to satisfy the robot tracking scenario. The results show that the neural network algorithm has a good recognition effect on the tangential angle of the curved surface. The force error between the normal force and the expected force is substantially within ± 2 N after 60 iterations of the robot force control algorithm based on A2C; additionally, the variance of the force error decreases by 50.7%, 34.05% and 79.41%, respectively, compared with the force signals obtained by a fuzzy iterative algorithm and an explicit force control with two sets of fixed control parameters.

18 citations

References
More filters
Book
01 Jan 1988
TL;DR: This book provides a clear and simple account of the key ideas and algorithms of reinforcement learning, which ranges from the history of the field's intellectual foundations to the most recent developments and applications.
Abstract: Reinforcement learning, one of the most active research areas in artificial intelligence, is a computational approach to learning whereby an agent tries to maximize the total amount of reward it receives when interacting with a complex, uncertain environment. In Reinforcement Learning, Richard Sutton and Andrew Barto provide a clear and simple account of the key ideas and algorithms of reinforcement learning. Their discussion ranges from the history of the field's intellectual foundations to the most recent developments and applications. The only necessary mathematical background is familiarity with elementary concepts of probability. The book is divided into three parts. Part I defines the reinforcement learning problem in terms of Markov decision processes. Part II provides basic solution methods: dynamic programming, Monte Carlo methods, and temporal-difference learning. Part III presents a unified view of the solution methods and incorporates artificial neural networks, eligibility traces, and planning; the two final chapters present case studies and consider the future of reinforcement learning.

37,989 citations


"Learning Dual Arm Coordinated Reach..." refers background or methods in this paper

  • ...We suggest readers to go through [17] for a detailed explanation of RL and actor-critic algorithms....

    [...]

  • ...In any standard RL environment [17], there is an agent and an environment....

    [...]

  • ...Specifically, Reinforcement Learning (RL) [17] provides a very good framework and some intitial works used it for learning motor control [18] and to teach a biped how to walk [19]....

    [...]

Posted Content
TL;DR: This work presents an actor-critic, model-free algorithm based on the deterministic policy gradient that can operate over continuous action spaces, and demonstrates that for many of the tasks the algorithm can learn policies end-to-end: directly from raw pixel inputs.
Abstract: We adapt the ideas underlying the success of Deep Q-Learning to the continuous action domain. We present an actor-critic, model-free algorithm based on the deterministic policy gradient that can operate over continuous action spaces. Using the same learning algorithm, network architecture and hyper-parameters, our algorithm robustly solves more than 20 simulated physics tasks, including classic problems such as cartpole swing-up, dexterous manipulation, legged locomotion and car driving. Our algorithm is able to find policies whose performance is competitive with those found by a planning algorithm with full access to the dynamics of the domain and its derivatives. We further demonstrate that for many of the tasks the algorithm can learn policies end-to-end: directly from raw pixel inputs.

4,225 citations


"Learning Dual Arm Coordinated Reach..." refers background in this paper

  • ...However, DDPG performs poorly when complex tasks using multiple kinematic chains are involved....

    [...]

  • ...Development of elegant Deep RL algorithms [20], [25] in recent times provided a framework for learning continuous control of manipulators [21]....

    [...]

  • ...This new framework, based on DDPG, is proposed in the context of robotics and comparative results are presented between DiGrad and DDPG. Results show that DiGrad is stable and performs better compared to DDPG in the same scenarios....

    [...]

  • ...Also, DDPG does not have the concept of a compound action....

    [...]

  • ...Our framework and training algorithm is very similar to DDPG except in following points....

    [...]

Proceedings ArticleDOI
24 Dec 2012
TL;DR: A new physics engine tailored to model-based control, based on the modern velocity-stepping approach which avoids the difficulties with spring-dampers, which can compute both forward and inverse dynamics.
Abstract: We describe a new physics engine tailored to model-based control. Multi-joint dynamics are represented in generalized coordinates and computed via recursive algorithms. Contact responses are computed via efficient new algorithms we have developed, based on the modern velocity-stepping approach which avoids the difficulties with spring-dampers. Models are specified using either a high-level C++ API or an intuitive XML file format. A built-in compiler transforms the user model into an optimized data structure used for runtime computation. The engine can compute both forward and inverse dynamics. The latter are well-defined even in the presence of contacts and equality constraints. The model can include tendon wrapping as well as actuator activation states (e.g. pneumatic cylinders or muscles). To facilitate optimal control applications and in particular sampling and finite differencing, the dynamics can be evaluated for different states and controls in parallel. Around 400,000 dynamics evaluations per second are possible on a 12-core machine, for a 3D homanoid with 18 dofs and 6 active contacts. We have already used the engine in a number of control applications. It will soon be made publicly available.

4,018 citations


"Learning Dual Arm Coordinated Reach..." refers result in this paper

  • ...MuJoCo provides accurate collision and position data, which is used to model our state and reward function....

    [...]

  • ...The training is carried out by executing compound actions provided by the actor, in MuJoCo and observing the obtained states and rewards....

    [...]

  • ...Joint trajectories thus obtained are followed using angular velocity controllers and results are shown in a dynamic simulation environment, MuJoCo....

    [...]

  • ...All the above mentioned environments are developed and trained in MuJoCo....

    [...]

  • ...All these results are shown in MuJoCo [24] simulation environment....

    [...]

Journal ArticleDOI
TL;DR: In this paper, the authors presented the first randomized approach to kinodynamic planning (also known as trajectory planning or trajectory design), where the task is to determine control inputs to drive a robot from an unknown position to an unknown target.
Abstract: This paper presents the first randomized approach to kinodynamic planning (also known as trajectory planning or trajectory design). The task is to determine control inputs to drive a robot from an ...

2,993 citations


"Learning Dual Arm Coordinated Reach..." refers methods in this paper

  • ...method involves searching [5]–[9] for a feasible inverse kinematic solution that can provide valid grasp solutions....

    [...]

Proceedings Article
21 Jun 2014
TL;DR: This paper introduces an off-policy actor-critic algorithm that learns a deterministic target policy from an exploratory behaviour policy and demonstrates that deterministic policy gradient algorithms can significantly outperform their stochastic counterparts in high-dimensional action spaces.
Abstract: In this paper we consider deterministic policy gradient algorithms for reinforcement learning with continuous actions. The deterministic policy gradient has a particularly appealing form: it is the expected gradient of the action-value function. This simple form means that the deterministic policy gradient can be estimated much more efficiently than the usual stochastic policy gradient. To ensure adequate exploration, we introduce an off-policy actor-critic algorithm that learns a deterministic target policy from an exploratory behaviour policy. We demonstrate that deterministic policy gradient algorithms can significantly outperform their stochastic counterparts in high-dimensional action spaces.

2,174 citations


"Learning Dual Arm Coordinated Reach..." refers background in this paper

  • ...Development of elegant Deep RL algorithms [20], [25] in recent times provided a framework for learning continuous control of manipulators [21]....

    [...]