scispace - formally typeset
Open AccessBook ChapterDOI

Learning to Use Toes in a Humanoid Robot

Reads0
Chats0
TLDR
It is shown that a model-free approach to learn behaviors in joint space can be successfully used to utilize toes of a humanoid robot to learn different kick behaviors on simulated Nao robots with toes in the RoboCup 3D soccer simulator.
Abstract
In this paper we show that a model-free approach to learn behaviors in joint space can be successfully used to utilize toes of a humanoid robot. Keeping the approach model-free makes it applicable to any kind of humanoid robot, or robot in general. Here we focus on the benefit on robots with toes which is otherwise more difficult to exploit. The task has been to learn different kick behaviors on simulated Nao robots with toes in the RoboCup 3D soccer simulator. As a result, the robot learned to step on its toe for a kick that performs 30% better than learning the same kick without toes.

read more

Content maybe subject to copyright    Report

Learning to Use Toes in a Humanoid Robot
Klaus Dorer
Hochschule Offenburg, Elektrotechnik-Informationstechnik, Germany,
klaus.dorer@fh-offenburg.de
Abstract. In this paper we show that a model-free approach to learn
behaviors in joint space can be successfully used to utilize toes of a
humanoid robot. Keeping the approach model-free makes it applicable
to any kind of humanoid robot, or robot in general. Here we focus on the
benefit on robots with toes which is otherwise more difficult to exploit.
The task has been to learn different kick behaviors on simulated Nao
robots with toes in the RoboCup 3D soccer simulator. As a result, the
robot learned to step on its toe for a kick that performs 30% better than
learning the same kick without toes.
1 INTRODUCTION
Evolution has spent the effort to equip humans with toes. Toes, among other ad-
vantages, allow humans to walk smoother and faster with longer steps. Naturally,
this has been subject of much research in humanoid robotics.
Passive toe joints have been suggested and used since quite some time to
consume and release energy in the toe-off phase of walking. Early examples are
the Monroe
[
8
]
, HRP-2
[
15
]
or Wabian-2R
[
14
]
robots.
Active toe joints are less common, but also in use for several years. The H6
robot
[
11
]
used toes to reduce angular knee speeds when walking, to increase leg
length when climbing and to kneel. The Toni robot
[
3
]
improved walking by over-
extending the unloading leg using toes. Toyota presented a toed robot running
at 7km/h
[
16
]
. Lola
[
4
]
is equipped with active toe joints and active pelvis joints
to research human-like walking. Active toe joints increase the difficulty of leg
behaviors in a couple of ways:
of course they add a degree of freedom (DoF).
it makes the leg of the robot kinematically redundant. Inverse-kinematics
will not find unique solutions which adds effort to deal with, but also offers
room for optimization
[
5
]
.
moving the toe of a support leg in any case reduces the contact area of the
foot, either by stepping on the toe or by lifting the toe off the ground.
on real robots, they typically increase the complexity of construction and
weight of the feet and consume energy.
All examples above for passive or active toes are model-specific either in a sense
to be specific to a robot model or to be behavior-specific in calculating desired
toe angles.

A couple of more recently built robots avoid the complexity of actuating
toes and the inflexibility of flat feet by wearing shoes. Examples are Petman,
Poppy
[
9
]
or Durus
[
12; 2
]
that is able to walk many times more energy-efficiently
with this and other improvements. Shoes, or rounded feet in general, improve
energy-efficiency of walking, but would not provide the benefits of active toes
for any behavior like the kicking behavior demonstrated in this paper.
Learning in combination with toe movement has been employed by Ogura
et al.
[
14
]
. They used genetic algorithms to optimize some parameters of the
ZMP-based foot trajectory calculation for continuous and smooth foot motion.
This approach optimizes parameters of an abstract parameter space defined by
an underlying model. This has the advantage, that the dimension of the search
space is kept relatively small, but it does not generalize to work for any behavior.
Learning to kick is common in many RoboCup leagues. A good overview
can be found in
[
7
]
. However, none of these report on kicking with toed robots.
MacAlpine et al.
[
10
]
use a layered learning approach to learn a set of behaviors
in the RoboCup 3D soccer simulator used in this work. They learn keyframes
for kicking behaviors and parameters for a model-based walking. Although the
work is not focused on learning to use toes, they report on an up to 60% impro-
vement in overall soccer game play using a Nao robot variation with toes (see
Section 2). This demonstrates the ability of their approach to generalize to dif-
ferent robot models, but is still model-based for walking. Abdolmaleki et al.
[
1
]
use a keyframe based approach and CMA-ES, as we do, to learn a kick with
controlled distance, which achieves longer kick distances than reported here, but
also requires considerably longer preparation time. However, their approach is
limited to two keyframes resulting in 25 learning parameters, while our approach
is not limited to an upper number of keyframes (see Section 3). Also they do not
make use of toes. It is interesting to see, that also in their kick, the robot moves
to the tip of its support foot to lengthen the support leg. Not using toes, their
kick results in a falling robot after each kick.
The work presented here generates raw output data during learning without
using an underlying robot- or behavior-model. To demonstrate this, we present
learning results on different robot models (focusing on a simulated Nao robot
with toes) and two very different kick behaviors without an underlying model.
The remainder of the paper is organized as follows: In Section 2 we provide
some details of the simulation environment used. Section 3 explains our model-
free approach to learning behaviors using toes. It is followed by experimental
results in Section 4 before we conclude and indicate future work in Section 5.
2 DOMAIN
The robots used in this work are robots of the RoboCup 3D soccer simulation
which is based on SimSpark
1
and initially initiated by
[
13
]
. It uses the ODE
physics engine
2
and runs at a speed of 50Hz. The simulator provides variations
1
http://simspark.sourceforge.net/
2
http://www.ode.org/

of Aldebaran Nao robots with 22 DoF for the robot types without toes and
24 DoF for the type with toes, NaoToe henceforth. More specifically, the robot
has 6 (7) DoF in each leg, 4 in each arm and 2 in its neck. There are several
simplifications in the simulation compared to the real Nao:
all motors of the simulated Nao are of equal strength whereas the real Nao
has weaker motors in the arms and different gears in the leg pitch motors.
joints do not experience extensive backlash
rotation axes of the hip yaw part of the hip are identical in both robots, but
the simulated robot can move hip yaw for each leg independently, whereas
for the real Nao, left and right hip yaw are coupled
the simulated Naos do not have hands
the touch model of the ground is softer and therefore more forgiving to
stronger ground touches in the simulation
energy consumption and heat is not simulated
masses are assumed to be point masses in the center of each body part
The feet of NaoToe are modeled as rectangular body parts of size 8cm x
12cm x 2cm for the foot and 8cm x 4cm x 1cm for the toes (see Figure 1). The
two body parts are connected with a hinge joint that can move from -1 degrees
(downward) to 70 degrees.
All joints can move at an angular speed of at most 7.02 degrees per 20ms.
The simulation server expects to get the desired speed at 50 Hz for each joint. If
no speeds are sent to the server it will continue movement of the joint with the
last speed received. Joint angles are noiselessly perceived at 50Hz, but with a
delay of 40ms compared to sent actions. So only after two cycles the robot knows
the result of a triggered action. A controller provided for each joint inside the
server tries to achieve the requested speed, but is subject to maximum torque,
maximum angular speed and maximum joint angles.
The simulator is able to run 22 simulated Naos in real-time on reasonable
CPUs. It is used as competition platform for the RoboCup 3D soccer simulation
league
3
. In this context, only a single agent was running in the simulator.
3 APPROACH
The guiding goal behind our approach is to create a framework that is model-
free. With model-free we depict an approach that does not make any assumptions
about a robot’s architecture nor the task to be performed. Thus, from the view-
point of learning, our model consists of a set of flat parameters. These parameters
are later grounded inside the domain. In our case, the grounding would mean
to create 50 joint angles or angular speed values per second for each of the 24
joints of NaoToe. This would result in 1200 values to learn for a behavior with
one second duration assuming the 50 Hz frequency of the simulator. This seemed
unreasonable for time being and led to some steps of relaxing the ultimate goal.
3
http://www.robocup.org/robocup-soccer/simulation/

Fig. 1. Wire model of the Nao with toes (left) and how it is visualized (right).
As a first step, the search space has been limited to the leg joints only. This
effectively limits the current implementation of the approach to leg behaviors,
excluding, for example, behaviors to get up. Also, instead of providing 50 values
per second for each joint, we make use of the fact that output values of a joint
over time are not independent. Therefore, we learn keyframes, i.e. all joint angles
for discrete phases of movement together with the duration of the phase from
keyframe to keyframe. The experiments described in this paper used two to
eight of such phases. The number of phases is variable between learning runs,
but not subject to learning for now, except for skipping phases by learning a
zero duration for it.
The RoboCup server requires robots to send the actual angular speed of each
joint as a command. So the first representation used in this work is to directly
learn the speed values to be sent to the simulator. This requires to learn 15
parameters per phase (14 joints + 1 for the duration of the phase) resulting
in 30, 60, 90 and 120 parameters for the 2, 4, 6, 8 phases worked with. The
disadvantage of this approach is, that the speed will be constant during one
phase and will especially not adapt to discrepancies of the commanded and the
true motor movement.
The second representation therefore interpreted the parameters as angular
values to reach at the end of a phase as is done in
[
10
]
for kicking behaviors. A
simple controller divided the difference of the current angle and the goal angle of
each joint by the duration and sent a speed accordingly. The two representations
differ in cases, when the motor does not exactly follow the commanded speed.
Using keyframes of angles will adjust the speeds to this situation.
A third representation used a combination of angular value and the maximum
amount of angular speed each joint should have. The direction of movement
is entirely encoded in the angular values, but the speed is a combination of
representation one and two above. If the amount of angular speed does not allow
to reach the angular value, the joint behaves like in version 1. If the amount of
angular speed is bigger, the joint behaves like version 2. This almost doubles the
amount of parameters to learn, but the co-domain of values for the speed values
is half the size, since here we only require an absolute amount of angular speed.

Interpolation between keyframes of angles is linear for now. It could be chan-
ged to polynoms or splines, but we do not expect a big difference since the
resulting behavior is anyhow smoothened by the inertia of body parts and since
phases can be and are learned to be short in time if the difference from linear to
polynomial matters.
Learning is done using plain genetic algorithms and covariance matrix adap-
tation evolutionary strategies (CMA-ES)
[
6
]
. Feedback from the domain is pro-
vided by a fitness function that defines the utility of a robot. Currently imple-
mented fitness functions use ball position, robot orientation and position during
or at the end of a run. The decision maker to trigger the behavior also uses foot
force sensors.
To summarize, the following domain knowledge is built into our approach:
the system has to provide values at 50Hz (used as angular speeds of the
joints)
there are up to 232 free parameters for which we know the range of reasonable
values (defined by minimum and maximum joint angles and angular speeds
and maximum phase duration)
a fitness function using domain information gives feedback about the utility
of a parameter set
a kicking behavior is made possible by moving the player near the vicinity
of the ball.
The following domain knowledge is not required or built into the system:
geometry, mass of body parts
position or axis of joints
the robot type (humanoid, four-legged, ...)
4 RESULTS
The first behaviors to learn were kicks. Experiments have been conducted as
follows. The robot connects to the server and is beamed to a position favorable
for kicking. It then starts to step in place. Kicking without stepping is easier to
achieve, but does not resemble typical situations during a real game. The step in
place is a model-based, inverse-kinematic walk and is not subject to this learning
for now. After 1.4s, the agent is free to decide to kick. It will then decide to kick
as soon as foot pressure sensors show a favorable condition. Here it means if
kicking with the right leg, the left leg had to just touch the ground and the right
leg had to leave the ground. After the kick behavior, the agent continues to step
in place until the overall run took five seconds. A run is stopped earlier, if the
agent falls (z component of torso’s up-vector < 0.5). This will typically be the
case for most of the individuals of the initial random population.
Table 1 shows the influence of the three representations used for the lear-
ning parameters as well as the influence of using different amounts of phases.
Each value in the table is the result of 400.000 kicks performed using genetic

Citations
More filters
Book ChapterDOI

UT Austin Villa: RoboCup 2015 3D Simulation League Competition and Technical Challenges Champions

TL;DR: The changes and improvements made to the UT Austin Villa team between 2014 and 2015 that allowed it to win both the main competition and each of the league technical challenges are described.
Proceedings ArticleDOI

Learning Humanoid Robot Running Skills through Proximal Policy Optimization

TL;DR: In this article, the authors presented a methodology based on deep reinforcement learning that learns running skills without any prior knowledge, using a neural network whose inputs are related to robot's dynamics.

Human-like Walking withKneeStretched, Heel-contact andToe-off Motion byaHumanoid Robot

TL;DR: In this paper, a humanoid robot, WABIAN-2R, capable of human-like walking withstretched knees and heel-contact and toe-off motions is proposed.
Journal ArticleDOI

Learning Humanoid Robot Running Motions with Symmetry Incentive through Proximal Policy Optimization

TL;DR: In this paper, a methodology based on deep reinforcement learning was used to develop running skills in a humanoid robot with no prior knowledge, and the algorithm used for learning is the Proximal Policy Optimization (PPO).
Proceedings ArticleDOI

Deep Reinforcement Multi-Directional Kick-Learning of a Simulated Robot with Toes

TL;DR: In this paper, the authors describe a thorough analysis of using PPO to learn kick behaviors with simulated NAO robots in the simspark environment, which includes an investigation of the influence of PPO hyperparameters, network size, training setups and performance in real games.
References
More filters
Journal ArticleDOI

Reducing the time complexity of the derandomized evolution strategy with covariance matrix adaptation (CMA-ES)

TL;DR: A novel evolutionary optimization strategy based on the derandomized evolution strategy with covariance matrix adaptation (CMA-ES), intended to reduce the number of generations required for convergence to the optimum, which results in a highly parallel algorithm which scales favorably with large numbers of processors.
Proceedings ArticleDOI

Human-like walking with knee stretched, heel-contact and toe-off motion by a humanoid robot

TL;DR: The effectiveness of the pattern generation and mechanism of WABIAN-2R, which have the ability to realize more human-like walking styles in a humanoid robot, are confirmed.
Proceedings ArticleDOI

Fast running experiments involving a humanoid robot

TL;DR: An implementation of fast running motions involving a humanoid robot using a motion generation and a balance control and a human-sized humanoid robot that can run forward at 7.0 [km/h] is presented.
Proceedings ArticleDOI

Toe joints that enhance bipedal and fullbody motion of humanoid robots

TL;DR: Addresses the extension of a humanoid's action capability by attaching toe joints, and maximum speed of knee joints can be reduced at the same walking speed, and 80% faster walking speed is achieved on humanoid 'H6'.
Proceedings ArticleDOI

Faster and Smoother Walking of Humanoid HRP-2 with Passive Toe Joints

TL;DR: The simulations showed that adding passive toe joints allows smoother and 1.5 faster walking in biped robots.
Related Papers (5)
Frequently Asked Questions (1)
Q1. What have the authors contributed in "Learning to use toes in a humanoid robot" ?

In this paper the authors show that a model-free approach to learn behaviors in joint space can be successfully used to utilize toes of a humanoid robot.