Generalized model learning for Reinforcement Learning on a humanoid robot
read more
Citations
Reinforcement learning in robotics: A survey
Reinforcement Learning in Robotics: A Survey.
Survey of Model-Based Reinforcement Learning: Applications on Robotics
Learning Motor Skills: From Algorithms to Robot Experiments
Humanoid robots learning to walk faster: from the real world to simulation and back
References
Reinforcement Learning: An Introduction
Induction of Decision Trees
Learning from delayed rewards
R-max - a general polynomial time algorithm for near-optimal reinforcement learning
Related Papers (5)
Frequently Asked Questions (12)
Q2. What are the main features of RL-DT?
Its two main features are that it uses decision trees to generalize the effects of actions across states, and that it has explicit exploration and exploitation modes.
Q3. How can RL-DT learn a task on a humanoid robot?
The authors demonstrated that reinforcement learning, and specifically model-based reinforcement learning, can be successfully used to learn a task on a humanoid robot.
Q4. What is the name of the algorithm?
In [13], the authors introduce an algorithm called RAM-RMAX, which is a modelbased algorithm where some part of the model is provided to the algorithm ahead of time.
Q5. What is the importance of penalty kicks in the SPL?
Penalty kicks are critical to success in the SPL as many of the teams are evenly matched and many games end in a tie and are decided by penalty kicks.
Q6. How far did the robot have to shift its leg before kicking?
The ball was placed 30 mm left of the penalty mark, requiring the robot to shift its leg out 3 or 4 times before kicking the ball to aim it past the keeper.
Q7. What is the optimal policy for the RL-DT algorithm?
The algorithm was run using ǫ-greedy exploration, where the agent takes a random exploration action ǫ percent of the time and takes the optimal action the rest of the time.
Q8. Why did the authors start with the Webots simulator?
Due to the difficulties and time involved in performinglearning experiments on the physical robot, the authors started by performing experiments in the Webots simulator from Cyberbotics 3.
Q9. How many penalties were scored in the 9 games?
Out of the 9 games that were decided by penalty kicks (the rest were left as a draw), only 3 had goals scored during the best of five penalty kicks.
Q10. How many times did the robot shift its leg outward?
3http://www.cyberbotics.comThe optimal policy in this task was to shift the leg outward 4 times, so that the robot’s foot was 112 mm out from its hip, and then kick.
Q11. What is the main difference between the two methods?
One of the methods that they mention for scaling up learning methods is to learn action models, which is similar to the models of transition effects of actions that RL-DT learns with its decision trees.
Q12. How does the algorithm differ from theirs?
Their approach differs from theirs by starting in an R-MAX-like exploration mode, which provides a more guided exploration approach than an ǫ-greedy policy.