Learning to Use Toes in a Humanoid Robot
Klaus Dorer
Hochschule Offenburg, Elektrotechnik-Informationstechnik, Germany,
klaus.dorer@fh-offenburg.de
Abstract. In this paper we show that a model-free approach to learn
behaviors in joint space can be successfully used to utilize toes of a
humanoid robot. Keeping the approach model-free makes it applicable
to any kind of humanoid robot, or robot in general. Here we focus on the
benefit on robots with toes which is otherwise more difficult to exploit.
The task has been to learn different kick behaviors on simulated Nao
robots with toes in the RoboCup 3D soccer simulator. As a result, the
robot learned to step on its toe for a kick that performs 30% better than
learning the same kick without toes.
1 INTRODUCTION
Evolution has spent the effort to equip humans with toes. Toes, among other ad-
vantages, allow humans to walk smoother and faster with longer steps. Naturally,
this has been subject of much research in humanoid robotics.
Passive toe joints have been suggested and used since quite some time to
consume and release energy in the toe-off phase of walking. Early examples are
the Monroe
[
8
]
, HRP-2
[
15
]
or Wabian-2R
[
14
]
robots.
Active toe joints are less common, but also in use for several years. The H6
robot
[
11
]
used toes to reduce angular knee speeds when walking, to increase leg
length when climbing and to kneel. The Toni robot
[
3
]
improved walking by over-
extending the unloading leg using toes. Toyota presented a toed robot running
at 7km/h
[
16
]
. Lola
[
4
]
is equipped with active toe joints and active pelvis joints
to research human-like walking. Active toe joints increase the difficulty of leg
behaviors in a couple of ways:
– of course they add a degree of freedom (DoF).
– it makes the leg of the robot kinematically redundant. Inverse-kinematics
will not find unique solutions which adds effort to deal with, but also offers
room for optimization
[
5
]
.
– moving the toe of a support leg in any case reduces the contact area of the
foot, either by stepping on the toe or by lifting the toe off the ground.
– on real robots, they typically increase the complexity of construction and
weight of the feet and consume energy.
All examples above for passive or active toes are model-specific either in a sense
to be specific to a robot model or to be behavior-specific in calculating desired
toe angles.
A couple of more recently built robots avoid the complexity of actuating
toes and the inflexibility of flat feet by wearing shoes. Examples are Petman,
Poppy
[
9
]
or Durus
[
12; 2
]
that is able to walk many times more energy-efficiently
with this and other improvements. Shoes, or rounded feet in general, improve
energy-efficiency of walking, but would not provide the benefits of active toes
for any behavior like the kicking behavior demonstrated in this paper.
Learning in combination with toe movement has been employed by Ogura
et al.
[
14
]
. They used genetic algorithms to optimize some parameters of the
ZMP-based foot trajectory calculation for continuous and smooth foot motion.
This approach optimizes parameters of an abstract parameter space defined by
an underlying model. This has the advantage, that the dimension of the search
space is kept relatively small, but it does not generalize to work for any behavior.
Learning to kick is common in many RoboCup leagues. A good overview
can be found in
[
7
]
. However, none of these report on kicking with toed robots.
MacAlpine et al.
[
10
]
use a layered learning approach to learn a set of behaviors
in the RoboCup 3D soccer simulator used in this work. They learn keyframes
for kicking behaviors and parameters for a model-based walking. Although the
work is not focused on learning to use toes, they report on an up to 60% impro-
vement in overall soccer game play using a Nao robot variation with toes (see
Section 2). This demonstrates the ability of their approach to generalize to dif-
ferent robot models, but is still model-based for walking. Abdolmaleki et al.
[
1
]
use a keyframe based approach and CMA-ES, as we do, to learn a kick with
controlled distance, which achieves longer kick distances than reported here, but
also requires considerably longer preparation time. However, their approach is
limited to two keyframes resulting in 25 learning parameters, while our approach
is not limited to an upper number of keyframes (see Section 3). Also they do not
make use of toes. It is interesting to see, that also in their kick, the robot moves
to the tip of its support foot to lengthen the support leg. Not using toes, their
kick results in a falling robot after each kick.
The work presented here generates raw output data during learning without
using an underlying robot- or behavior-model. To demonstrate this, we present
learning results on different robot models (focusing on a simulated Nao robot
with toes) and two very different kick behaviors without an underlying model.
The remainder of the paper is organized as follows: In Section 2 we provide
some details of the simulation environment used. Section 3 explains our model-
free approach to learning behaviors using toes. It is followed by experimental
results in Section 4 before we conclude and indicate future work in Section 5.
2 DOMAIN
The robots used in this work are robots of the RoboCup 3D soccer simulation
which is based on SimSpark
1
and initially initiated by
[
13
]
. It uses the ODE
physics engine
2
and runs at a speed of 50Hz. The simulator provides variations
1
http://simspark.sourceforge.net/
2
http://www.ode.org/
of Aldebaran Nao robots with 22 DoF for the robot types without toes and
24 DoF for the type with toes, NaoToe henceforth. More specifically, the robot
has 6 (7) DoF in each leg, 4 in each arm and 2 in its neck. There are several
simplifications in the simulation compared to the real Nao:
– all motors of the simulated Nao are of equal strength whereas the real Nao
has weaker motors in the arms and different gears in the leg pitch motors.
– joints do not experience extensive backlash
– rotation axes of the hip yaw part of the hip are identical in both robots, but
the simulated robot can move hip yaw for each leg independently, whereas
for the real Nao, left and right hip yaw are coupled
– the simulated Naos do not have hands
– the touch model of the ground is softer and therefore more forgiving to
stronger ground touches in the simulation
– energy consumption and heat is not simulated
– masses are assumed to be point masses in the center of each body part
The feet of NaoToe are modeled as rectangular body parts of size 8cm x
12cm x 2cm for the foot and 8cm x 4cm x 1cm for the toes (see Figure 1). The
two body parts are connected with a hinge joint that can move from -1 degrees
(downward) to 70 degrees.
All joints can move at an angular speed of at most 7.02 degrees per 20ms.
The simulation server expects to get the desired speed at 50 Hz for each joint. If
no speeds are sent to the server it will continue movement of the joint with the
last speed received. Joint angles are noiselessly perceived at 50Hz, but with a
delay of 40ms compared to sent actions. So only after two cycles the robot knows
the result of a triggered action. A controller provided for each joint inside the
server tries to achieve the requested speed, but is subject to maximum torque,
maximum angular speed and maximum joint angles.
The simulator is able to run 22 simulated Naos in real-time on reasonable
CPUs. It is used as competition platform for the RoboCup 3D soccer simulation
league
3
. In this context, only a single agent was running in the simulator.
3 APPROACH
The guiding goal behind our approach is to create a framework that is model-
free. With model-free we depict an approach that does not make any assumptions
about a robot’s architecture nor the task to be performed. Thus, from the view-
point of learning, our model consists of a set of flat parameters. These parameters
are later grounded inside the domain. In our case, the grounding would mean
to create 50 joint angles or angular speed values per second for each of the 24
joints of NaoToe. This would result in 1200 values to learn for a behavior with
one second duration assuming the 50 Hz frequency of the simulator. This seemed
unreasonable for time being and led to some steps of relaxing the ultimate goal.
3
http://www.robocup.org/robocup-soccer/simulation/
Fig. 1. Wire model of the Nao with toes (left) and how it is visualized (right).
As a first step, the search space has been limited to the leg joints only. This
effectively limits the current implementation of the approach to leg behaviors,
excluding, for example, behaviors to get up. Also, instead of providing 50 values
per second for each joint, we make use of the fact that output values of a joint
over time are not independent. Therefore, we learn keyframes, i.e. all joint angles
for discrete phases of movement together with the duration of the phase from
keyframe to keyframe. The experiments described in this paper used two to
eight of such phases. The number of phases is variable between learning runs,
but not subject to learning for now, except for skipping phases by learning a
zero duration for it.
The RoboCup server requires robots to send the actual angular speed of each
joint as a command. So the first representation used in this work is to directly
learn the speed values to be sent to the simulator. This requires to learn 15
parameters per phase (14 joints + 1 for the duration of the phase) resulting
in 30, 60, 90 and 120 parameters for the 2, 4, 6, 8 phases worked with. The
disadvantage of this approach is, that the speed will be constant during one
phase and will especially not adapt to discrepancies of the commanded and the
true motor movement.
The second representation therefore interpreted the parameters as angular
values to reach at the end of a phase as is done in
[
10
]
for kicking behaviors. A
simple controller divided the difference of the current angle and the goal angle of
each joint by the duration and sent a speed accordingly. The two representations
differ in cases, when the motor does not exactly follow the commanded speed.
Using keyframes of angles will adjust the speeds to this situation.
A third representation used a combination of angular value and the maximum
amount of angular speed each joint should have. The direction of movement
is entirely encoded in the angular values, but the speed is a combination of
representation one and two above. If the amount of angular speed does not allow
to reach the angular value, the joint behaves like in version 1. If the amount of
angular speed is bigger, the joint behaves like version 2. This almost doubles the
amount of parameters to learn, but the co-domain of values for the speed values
is half the size, since here we only require an absolute amount of angular speed.
Interpolation between keyframes of angles is linear for now. It could be chan-
ged to polynoms or splines, but we do not expect a big difference since the
resulting behavior is anyhow smoothened by the inertia of body parts and since
phases can be and are learned to be short in time if the difference from linear to
polynomial matters.
Learning is done using plain genetic algorithms and covariance matrix adap-
tation evolutionary strategies (CMA-ES)
[
6
]
. Feedback from the domain is pro-
vided by a fitness function that defines the utility of a robot. Currently imple-
mented fitness functions use ball position, robot orientation and position during
or at the end of a run. The decision maker to trigger the behavior also uses foot
force sensors.
To summarize, the following domain knowledge is built into our approach:
– the system has to provide values at 50Hz (used as angular speeds of the
joints)
– there are up to 232 free parameters for which we know the range of reasonable
values (defined by minimum and maximum joint angles and angular speeds
and maximum phase duration)
– a fitness function using domain information gives feedback about the utility
of a parameter set
– a kicking behavior is made possible by moving the player near the vicinity
of the ball.
The following domain knowledge is not required or built into the system:
– geometry, mass of body parts
– position or axis of joints
– the robot type (humanoid, four-legged, ...)
4 RESULTS
The first behaviors to learn were kicks. Experiments have been conducted as
follows. The robot connects to the server and is beamed to a position favorable
for kicking. It then starts to step in place. Kicking without stepping is easier to
achieve, but does not resemble typical situations during a real game. The step in
place is a model-based, inverse-kinematic walk and is not subject to this learning
for now. After 1.4s, the agent is free to decide to kick. It will then decide to kick
as soon as foot pressure sensors show a favorable condition. Here it means if
kicking with the right leg, the left leg had to just touch the ground and the right
leg had to leave the ground. After the kick behavior, the agent continues to step
in place until the overall run took five seconds. A run is stopped earlier, if the
agent falls (z component of torso’s up-vector < 0.5). This will typically be the
case for most of the individuals of the initial random population.
Table 1 shows the influence of the three representations used for the lear-
ning parameters as well as the influence of using different amounts of phases.
Each value in the table is the result of 400.000 kicks performed using genetic