Comparison of Multiple Reinforcement Learning and Deep Reinforcement Learning Methods for the Task Aimed at Achieving the Goal
read more
Citations
Deep Q-Learning in Robotics: Improvement of Accuracy and Repeatability
A New Optimal Design of Stable Feedback Control of Two-Wheel System Based on Reinforcement Learning
A Note on the Frequency Characteristics of Discrete Systems in the Complex Plane
References
Mastering the game of Go with deep neural networks and tree search
Design and use paradigms for Gazebo, an open-source multi-robot simulator
An Introduction to Deep Reinforcement Learning
Reducing the Barrier to Entry of Complex Robotic Software: a MoveIt! Case Study
Review of Deep Reinforcement Learning for Robot Manipulation
Related Papers (5)
Frequently Asked Questions (21)
Q2. What have the authors stated for future works in "Comparison of multiple reinforcement learning and deep reinforcement learning methods for the task aimed at achieving the goal" ?
RL ( QL ) and DRL ( DQN, DSARSA ) techniques completed the conditions for the required accuracy represented by Rs, but in the perspective of the future research are techniques based on deep neural network are more stable and efficient. This work can provide a foundation for future research on motion planning in the field of robotics using advanced deep reinforcement learning methods such as DDPG ( Deep Deterministic Policy Gradient ), TD3 ( Twin Delayed Deep Deterministic Policy Gradient ) and more.
Q3. What is the main approach of the learning agent’s DQN?
In the structure of learning, the approximation of the value function is with the convolutional neural network (CNN), that uses Q-network to obtain Q value like DQN.
Q4. What is the main problem of the RL method?
The main optimization problem of the RL method is the need to find the optimal policy π∗, which is defined such as maximizing the expected returnπ∗ = arg max π E τ∼π [Rt(τ)].
Q5. What are the limitations of traditional task and motion planning methods?
Traditional task and motion planning methods, such as RRT [18], RRT* [31] can solve complex tasks but require full state ob-servability, a lot of time for problem solving and are not adapted to dynamic scene changes.
Q6. What is the value of the parameter Rs?
The parameter Rs assumes values higher than 0 when the condition is successfully done, which is determined by the accuracy of the results (Eq. 10).
Q7. What are the common robot simulation tools?
Various robot simulation tools are used as an extension of the OpenAI toolkit, with Gazebo [1, 15] and PyBullet [8] being the most commonly used today.
Q8. What are the common methods used for learning tasks in the real world?
For this purpose, reinforcement learning (RL) methods such as Q-Learning, SARSA (State–action–reward–state–action), etc. are commonly used [27].
Q9. What is the spline curve of degree p?
A Bézier spline curve of degree p is defined by n+ 1 control points P0, P1, .., Pn [2]:B(t) = n∑ i=0 Ni,p(t)Pi, (7)where Ni,p(t) is a normalized B-Spline curve defined over the nodes.
Q10. What is the main theme of this article?
Robotics as a field of science has been evolving for the past several years and modern robots operating in the real world should learn new tasks autonomously, flexibly and adapt smoothly to different changes.
Q11. What is the goal of the optimization problem in the case of the learning process?
The main goal of the optimization problem in the case of the learning process was to maximize the ex-pected cumulative reward (Fig. 5) and to minimize the Euclidean distance accuracy error (Fig. 6).5R.
Q12. What is the main problem of the experiment?
— Soft Computing Journal, Volume 27, No.1, June 2021, Brno, Czech RepublicXAfter the learning process, the robotic arm UR3 is tested with the RVIZ simulation tool and the Gazebo 3D modeling tool, which communicate through ROS.
Q13. What is the definition of a TD control method?
The QL method was developed as an off-policy TD (Temporal difference) control algorithm, defined byQ(st, at)← Q(st, at) + α[rt+1 + γmax a Q(st+1, at)−Q(st, at)], (3)where α is a learning rate (0 < α ≤ 1), max a Q(st+1, a) is estimate of the optimal future value, and other parameters are described in the previous section [11, 27].State–action–reward–state–action (SARSA):is an on-policy TD control method, very similar to the previous Q-Learning method.
Q14. What is the safe area of the robot?
The safe area is created with given to the individual robot model to avoid collisions with the target area (imagine that the target area is a bin for an object selection problem).
Q15. What is the learning process of the RL/DRL agent?
(11)The learning process of the RL/DRL agent begins by examining the environment by performing actions from the initial state to the target state and collecting appropriate rewards (Eq. 9).
Q16. What is the area of restriction from which the robot is removed?
The purple box (Atarget) is approximately the area of restriction from which targets are removed, and the yellow box (Asearch) represents the area of safe movement.
Q17. What are the main advantages of RL and DRL?
Reinforcement Learning (RL) and Deep Reinforcement Learning (DRL) methods are a promising approach to solving complex tasks in the real world with physical robots.
Q18. What is the main idea of the paper?
Their approach to finding a point in Cartesian space using multiple RL/DRL techniques is based on previous work in the areas of reinforcement learning, deep reinforcement learning, and motion planning.
Q19. What is the corresponding parameter of the learning environment?
Hyper-parameters of individual RL/DRL techniques are given in the table (Tab. 1, Tab. 2).The reward function of the goal achievement experiment in each learning technique is defined as:Rt = dt(Pp, Pt)− dt(Pa, Pt)dt(Pi, Pt) +Rs, (9)where dt(Pi, Pt) is the initial Euclidean distance between the start Pi and target point Pt, and4MENDEL — Soft Computing Journal, Volume 27, No.1, June 2021, Brno, Czech RepublicXFigure 4: Agent-environment interaction in Markov’s decision-making process (MDPs) in their problem.
Q20. What is the purpose of the paper?
In this paper, the authors propose several RL/DRL methods for the task aimed at achieving the goal using the cooperating robotic arm UR3 for Universal Robots, more precisely a 6-axis robotic arm [28].
Q21. What is the main topic of this paper?
Planning the trajectory of the robotic arm as one of the most basic and challenging research topics in robotics has found considerable interest from research institutes in recent decades.