What have the authors contributed in "Learning to use toes in a humanoid robot" ?

Q: What have the authors contributed in "Learning to use toes in a humanoid robot" ?

In this paper the authors show that a model-free approach to learn behaviors in joint space can be successfully used to utilize toes of a humanoid robot.

(Open Access) Learning to Use Toes in a Humanoid Robot (2017) | Klaus Dorer

Learning to Use Toes in a Humanoid Robot

Klaus Dorer

Hochschule Oﬀenburg, Elektrotechnik-Informationstechnik, Germany,

klaus.dorer@fh-oﬀenburg.de

Abstract. In this paper we show that a model-free approach to learn

behaviors in joint space can be successfully used to utilize toes of a

humanoid robot. Keeping the approach model-free makes it applicable

to any kind of humanoid robot, or robot in general. Here we focus on the

beneﬁt on robots with toes which is otherwise more diﬃcult to exploit.

The task has been to learn diﬀerent kick behaviors on simulated Nao

robots with toes in the RoboCup 3D soccer simulator. As a result, the

robot learned to step on its toe for a kick that performs 30% better than

learning the same kick without toes.

1 INTRODUCTION

Evolution has spent the eﬀort to equip humans with toes. Toes, among other ad-

vantages, allow humans to walk smoother and faster with longer steps. Naturally,

this has been subject of much research in humanoid robotics.

Passive toe joints have been suggested and used since quite some time to

consume and release energy in the toe-oﬀ phase of walking. Early examples are

the Monroe

[

]

, HRP-2

[

]

or Wabian-2R

[

]

robots.

Active toe joints are less common, but also in use for several years. The H6

robot

[

]

used toes to reduce angular knee speeds when walking, to increase leg

length when climbing and to kneel. The Toni robot

[

]

improved walking by over-

extending the unloading leg using toes. Toyota presented a toed robot running

at 7km/h

[

]

. Lola

[

]

is equipped with active toe joints and active pelvis joints

to research human-like walking. Active toe joints increase the diﬃculty of leg

behaviors in a couple of ways:

– of course they add a degree of freedom (DoF).

– it makes the leg of the robot kinematically redundant. Inverse-kinematics

will not ﬁnd unique solutions which adds eﬀort to deal with, but also oﬀers

room for optimization

[

]

– moving the toe of a support leg in any case reduces the contact area of the

foot, either by stepping on the toe or by lifting the toe oﬀ the ground.

– on real robots, they typically increase the complexity of construction and

weight of the feet and consume energy.

All examples above for passive or active toes are model-speciﬁc either in a sense

to be speciﬁc to a robot model or to be behavior-speciﬁc in calculating desired

toe angles.

A couple of more recently built robots avoid the complexity of actuating

toes and the inﬂexibility of ﬂat feet by wearing shoes. Examples are Petman,

Poppy

[

]

or Durus

[

12; 2

]

that is able to walk many times more energy-eﬃciently

with this and other improvements. Shoes, or rounded feet in general, improve

energy-eﬃciency of walking, but would not provide the beneﬁts of active toes

for any behavior like the kicking behavior demonstrated in this paper.

Learning in combination with toe movement has been employed by Ogura

et al.

[

]

. They used genetic algorithms to optimize some parameters of the

ZMP-based foot trajectory calculation for continuous and smooth foot motion.

This approach optimizes parameters of an abstract parameter space deﬁned by

an underlying model. This has the advantage, that the dimension of the search

space is kept relatively small, but it does not generalize to work for any behavior.

Learning to kick is common in many RoboCup leagues. A good overview

can be found in

[

]

. However, none of these report on kicking with toed robots.

MacAlpine et al.

[

]

use a layered learning approach to learn a set of behaviors

in the RoboCup 3D soccer simulator used in this work. They learn keyframes

for kicking behaviors and parameters for a model-based walking. Although the

work is not focused on learning to use toes, they report on an up to 60% impro-

vement in overall soccer game play using a Nao robot variation with toes (see

Section 2). This demonstrates the ability of their approach to generalize to dif-

ferent robot models, but is still model-based for walking. Abdolmaleki et al.

[

]

use a keyframe based approach and CMA-ES, as we do, to learn a kick with

controlled distance, which achieves longer kick distances than reported here, but

also requires considerably longer preparation time. However, their approach is

limited to two keyframes resulting in 25 learning parameters, while our approach

is not limited to an upper number of keyframes (see Section 3). Also they do not

make use of toes. It is interesting to see, that also in their kick, the robot moves

to the tip of its support foot to lengthen the support leg. Not using toes, their

kick results in a falling robot after each kick.

The work presented here generates raw output data during learning without

using an underlying robot- or behavior-model. To demonstrate this, we present

learning results on diﬀerent robot models (focusing on a simulated Nao robot

with toes) and two very diﬀerent kick behaviors without an underlying model.

The remainder of the paper is organized as follows: In Section 2 we provide

some details of the simulation environment used. Section 3 explains our model-

free approach to learning behaviors using toes. It is followed by experimental

results in Section 4 before we conclude and indicate future work in Section 5.

2 DOMAIN

The robots used in this work are robots of the RoboCup 3D soccer simulation

which is based on SimSpark

and initially initiated by

[

]

. It uses the ODE

physics engine

and runs at a speed of 50Hz. The simulator provides variations

http://simspark.sourceforge.net/

http://www.ode.org/

of Aldebaran Nao robots with 22 DoF for the robot types without toes and

24 DoF for the type with toes, NaoToe henceforth. More speciﬁcally, the robot

has 6 (7) DoF in each leg, 4 in each arm and 2 in its neck. There are several

simpliﬁcations in the simulation compared to the real Nao:

– all motors of the simulated Nao are of equal strength whereas the real Nao

has weaker motors in the arms and diﬀerent gears in the leg pitch motors.

– joints do not experience extensive backlash

– rotation axes of the hip yaw part of the hip are identical in both robots, but

the simulated robot can move hip yaw for each leg independently, whereas

for the real Nao, left and right hip yaw are coupled

– the simulated Naos do not have hands

– the touch model of the ground is softer and therefore more forgiving to

stronger ground touches in the simulation

– energy consumption and heat is not simulated

– masses are assumed to be point masses in the center of each body part

The feet of NaoToe are modeled as rectangular body parts of size 8cm x

12cm x 2cm for the foot and 8cm x 4cm x 1cm for the toes (see Figure 1). The

two body parts are connected with a hinge joint that can move from -1 degrees

(downward) to 70 degrees.

All joints can move at an angular speed of at most 7.02 degrees per 20ms.

The simulation server expects to get the desired speed at 50 Hz for each joint. If

no speeds are sent to the server it will continue movement of the joint with the

last speed received. Joint angles are noiselessly perceived at 50Hz, but with a

delay of 40ms compared to sent actions. So only after two cycles the robot knows

the result of a triggered action. A controller provided for each joint inside the

server tries to achieve the requested speed, but is subject to maximum torque,

maximum angular speed and maximum joint angles.

The simulator is able to run 22 simulated Naos in real-time on reasonable

CPUs. It is used as competition platform for the RoboCup 3D soccer simulation

league

. In this context, only a single agent was running in the simulator.

3 APPROACH

The guiding goal behind our approach is to create a framework that is model-

free. With model-free we depict an approach that does not make any assumptions

about a robot’s architecture nor the task to be performed. Thus, from the view-

point of learning, our model consists of a set of ﬂat parameters. These parameters

are later grounded inside the domain. In our case, the grounding would mean

to create 50 joint angles or angular speed values per second for each of the 24

joints of NaoToe. This would result in 1200 values to learn for a behavior with

one second duration assuming the 50 Hz frequency of the simulator. This seemed

unreasonable for time being and led to some steps of relaxing the ultimate goal.

http://www.robocup.org/robocup-soccer/simulation/

Fig. 1. Wire model of the Nao with toes (left) and how it is visualized (right).

As a ﬁrst step, the search space has been limited to the leg joints only. This

eﬀectively limits the current implementation of the approach to leg behaviors,

excluding, for example, behaviors to get up. Also, instead of providing 50 values

per second for each joint, we make use of the fact that output values of a joint

over time are not independent. Therefore, we learn keyframes, i.e. all joint angles

for discrete phases of movement together with the duration of the phase from

keyframe to keyframe. The experiments described in this paper used two to

eight of such phases. The number of phases is variable between learning runs,

but not subject to learning for now, except for skipping phases by learning a

zero duration for it.

The RoboCup server requires robots to send the actual angular speed of each

joint as a command. So the ﬁrst representation used in this work is to directly

learn the speed values to be sent to the simulator. This requires to learn 15

parameters per phase (14 joints + 1 for the duration of the phase) resulting

in 30, 60, 90 and 120 parameters for the 2, 4, 6, 8 phases worked with. The

disadvantage of this approach is, that the speed will be constant during one

phase and will especially not adapt to discrepancies of the commanded and the

true motor movement.

The second representation therefore interpreted the parameters as angular

values to reach at the end of a phase as is done in

[

]

for kicking behaviors. A

simple controller divided the diﬀerence of the current angle and the goal angle of

each joint by the duration and sent a speed accordingly. The two representations

diﬀer in cases, when the motor does not exactly follow the commanded speed.

Using keyframes of angles will adjust the speeds to this situation.

A third representation used a combination of angular value and the maximum

amount of angular speed each joint should have. The direction of movement

is entirely encoded in the angular values, but the speed is a combination of

representation one and two above. If the amount of angular speed does not allow

to reach the angular value, the joint behaves like in version 1. If the amount of

angular speed is bigger, the joint behaves like version 2. This almost doubles the

amount of parameters to learn, but the co-domain of values for the speed values

is half the size, since here we only require an absolute amount of angular speed.

Interpolation between keyframes of angles is linear for now. It could be chan-

ged to polynoms or splines, but we do not expect a big diﬀerence since the

resulting behavior is anyhow smoothened by the inertia of body parts and since

phases can be and are learned to be short in time if the diﬀerence from linear to

polynomial matters.

Learning is done using plain genetic algorithms and covariance matrix adap-

tation evolutionary strategies (CMA-ES)

[

]

. Feedback from the domain is pro-

vided by a ﬁtness function that deﬁnes the utility of a robot. Currently imple-

mented ﬁtness functions use ball position, robot orientation and position during

or at the end of a run. The decision maker to trigger the behavior also uses foot

force sensors.

To summarize, the following domain knowledge is built into our approach:

– the system has to provide values at 50Hz (used as angular speeds of the

joints)

– there are up to 232 free parameters for which we know the range of reasonable

values (deﬁned by minimum and maximum joint angles and angular speeds

and maximum phase duration)

– a ﬁtness function using domain information gives feedback about the utility

of a parameter set

– a kicking behavior is made possible by moving the player near the vicinity

of the ball.

The following domain knowledge is not required or built into the system:

– geometry, mass of body parts

– position or axis of joints

– the robot type (humanoid, four-legged, ...)

4 RESULTS

The ﬁrst behaviors to learn were kicks. Experiments have been conducted as

follows. The robot connects to the server and is beamed to a position favorable

for kicking. It then starts to step in place. Kicking without stepping is easier to

achieve, but does not resemble typical situations during a real game. The step in

place is a model-based, inverse-kinematic walk and is not subject to this learning

for now. After 1.4s, the agent is free to decide to kick. It will then decide to kick

as soon as foot pressure sensors show a favorable condition. Here it means if

kicking with the right leg, the left leg had to just touch the ground and the right

leg had to leave the ground. After the kick behavior, the agent continues to step

in place until the overall run took ﬁve seconds. A run is stopped earlier, if the

agent falls (z component of torso’s up-vector < 0.5). This will typically be the

case for most of the individuals of the initial random population.

Table 1 shows the inﬂuence of the three representations used for the lear-

ning parameters as well as the inﬂuence of using diﬀerent amounts of phases.

Each value in the table is the result of 400.000 kicks performed using genetic

Learning to Use Toes in a Humanoid Robot

Figures

Citations

UT Austin Villa: RoboCup 2015 3D Simulation League Competition and Technical Challenges Champions

Learning Humanoid Robot Running Skills through Proximal Policy Optimization

Human-like Walking withKneeStretched, Heel-contact andToe-off Motion byaHumanoid Robot

Learning Humanoid Robot Running Motions with Symmetry Incentive through Proximal Policy Optimization

Deep Reinforcement Multi-Directional Kick-Learning of a Simulated Robot with Toes

References

Reducing the time complexity of the derandomized evolution strategy with covariance matrix adaptation (CMA-ES)

Human-like walking with knee stretched, heel-contact and toe-off motion by a humanoid robot

Fast running experiments involving a humanoid robot

Toe joints that enhance bipedal and fullbody motion of humanoid robots

Faster and Smoother Walking of Humanoid HRP-2 with Passive Toe Joints

Related Papers (5)

Performing the Kick During Walking for RoboCup 3D Soccer Simulation League Using Reinforcement Learning Algorithm

Humanoid Robot: Design and Fuzzy Logic Control Technique for Its Intelligent Behaviors

How to Select a Suitable Action against Strong Pushes in Adult-Size Humanoid Robot: Learning from Past Experiences

Adaptive motion control: dynamic kick for a humanoid robot

Deep Reinforcement Learning for a Humanoid Robot Soccer Player

Frequently Asked Questions (1)

Q1. What have the authors contributed in "Learning to use toes in a humanoid robot" ?