scispace - formally typeset
Proceedings ArticleDOI

Stochastic optimization of bipedal walking using gyro feedback and phase resetting

F. Faber, +1 more
- pp 203-209
TLDR
A method to optimize the walking pattern of a humanoid robot for forward speed using suitable metaheuristics and a feedback control mechanism that starts the next step at the moment of foot contact is presented.
Abstract
We present a method to optimize the walking pattern of a humanoid robot for forward speed using suitable metaheuristics Our starting point is a hand-tuned open-loop gait that we enhance with two feedback control mechanisms First, we employ a P-controller that regulates the foot angle in order to reduce angular velocity of the robot's body Second, we introduce a phase resetting mechanism that starts the next step at the moment of foot contact Using a physics-based simulation, we demonstrate that such feedback control is essential for achieving fast and robust locomotion

read more

Content maybe subject to copyright    Report

Stochastic Optimization of Bipedal Walking using
Gyro Feedback and Phase Resetting
Felix Faber and Sven Behnke
Computer Science Institute
University of Freiburg, Germany
{faber | behnke}@informatik.uni-freiburg.de
Abstract We present a method to optimize the walking
pattern of a humanoid robot for forward speed using suitable
metaheuristics. Our starting point is a hand-tuned open-loop gait
that we enhance with two feedback control mechanisms. First,
we employ a P-controller that regulates the foot angle in order to
reduce angular velocity of the robot’s body. Second, we introduce
a phase resetting mechanism that starts the next step at the
moment of foot contact. Using a physics-based simulation, we
demonstrate that such feedback control is essential for achieving
fast and robust locomotion.
For the optimization of open-loop parameters and parameters
of the feedback mechanisms, we compare Policy Gradient Rein-
forcement Learning (PGRL) and Particle Swarm Optimization
(PSO). To make optimization more data-efficient, we extend
PGRL by an adaptive step size and a sequential sampling
procedure. Our experiments show that the proposed extensions
increase the performance of PGRL significantly.
We selected the extended PGRL algorithm to optimize the
gait of a real robot. After optimization, the robot is able to walk
significantly faster.
I. INTRODUCTION
Enabling a humanoid robot to walk fast and stable is non-
trivial. Such robots are highly complex, non-linear dynamical
systems that are hard to control. Sensor noise, imprecise
actuators, and external disturbances further complicate control.
Among the most successful approaches to biped locomotion
are trajectory tracking methods that are based on precomputed
trajectories of the legs or the Zero Moment Point (ZMP) [1].
The joint trajectories are generated offline, for example, by
solving the dynamic equations of motion. During the walking
process, precise controllers are used for following the pre-
computed trajectories. Trajectory tracking, however, requires
a precise model of the robot. At the same time, solving the
dynamical equations of motion can be computationally very
demanding if a robot has many degrees of freedom (DOF). The
approach presented here does not require an accurate model
of the robot and has a low computational complexity.
A completely different approach is that of passive dynamic
walkers (PDW). They use the inherent machine dynamics for
walking and most of them walk without actuation or control.
McGeer [2] first introduced the notion of PDW and showed
that unactuated and uncontrolled planar walking down a slope
is possible. The main interest of the approach is the evolution
of periodic gaits, without considering starting or stopping,
while agility and responsiveness of motion, as required by
a versatile robot such as ours, play a minor role.
In addition, biologically inspired methods have evolved in
which central pattern generators (CPG) [3] generate joint
trajectories by using nonlinear oscillators. CPG-driven lo-
comotion has been successfully applied in cases where a
complete dynamical model of the robot is not available or
too complex to be useful [4], [5]. The disadvantage of CPG-
driven methods is that it is not trivial to determine appropriate
parameter settings for the oscillators in order to achieve a
stable gait. Our approach is similar to CPG methods in that
joint trajectories are generated and adapted online.
Although CPG-based methods do not use a model, stochas-
tic optimization of their parameters is possible. This has been
demonstrated, e.g., with the Aibo quadrupeds [6]–[8]. In order
to reject disturbances, sensory feedback is essential. Conse-
quently, several successful attempts to incorporate sensory
feedback have been reported, e.g. [9], [10].
In this paper, we propose extending an open-loop gait engine
with two feedback mechanisms: gyro feedback that stabilizes
the trunk and phase resetting at foot contact that entrains
the CPG with the natural dynamics of the robot-environment
system. The parameters of these feedback mechanisms are
included in a stochastic optimization procedure. We evaluate
the suitability of two state-of-the-art metaheuristics for our
task and propose extensions that make them more data-
efficient. This includes a sequential sampling procedure, an
adaptive step size, and automatic restarting.
After reviewing some of the related work, we present the
humanoid robot Jupp used for the experiments and its open-
loop gait engine in Sec. III and IV, respectively. The proposed
feedback mechanisms are detailed in Sec. V. The optimization
problem is defined in Sec. VI and Sec. VII presents the PGRL
and PSO metaheuristics together with the proposed extensions.
The heuristics and their extensions are evaluated in a physics-
based simulation, as described in Sec. VIII. Sec. IX reports
the results of transferring stochastic gait optimization to the
real robot.
II. RELATED WORK
Stochastic gait optimization has been successfully demon-
strated for the Aibo quadruped, e.g. by Kohl and Stone [6] who
proposed Policy Gradient Reinforcement Learning (PGRL).
They approximate the gradient of the objective function by
sampling in the vicinity of a parameter vector. In contrast to

our approach, the authors did not make use of sensory feed-
back. R
¨
ofer [11] used the readings of acceleration sensors to
judge stability of Aibo gaits during evolutionary optimization.
However, the acceleration sensors were not used to adapt the
gait online.
Geng et al. [10] successfully used PGRL for gait optimiza-
tion of a planar biped. While their approach optimizes only
two gait parameters, we include many more parameters in
the optimization. This is possible because of the data-efficient
sequential sampling procedure.
Hemker et al. [9] applied a sequential surrogate approach to
optimize the speed of their humanoid robot. In contrast to our
approach, the authors did not optimize feedback parameters.
The gait also does not include a phase resetting mechanism.
Kosse [12] optimized the gait of a KHR-1 humanoid robot
for forward speed using evolutionary strategies. His gait is
completely open-loop.
Nakanishi et al. [4] proposed phase resetting to adapt
the CPG-frequency of a planar biped. They reset the CPG-
oscillators when the foot hits the ground. The approach learns
open-loop gait parameters from demonstration, but does not
include feedback parameters in the learning process.
Aoi and Tsuchiya [5] also incorporated a phase-resetting
mechanism. The authors did not optimize the gait parameters
and did not make use of gyroscope feedback.
III. KIDSIZE HUMANOID ROBOT JUPP
The robot used for our experiments is Jupp, a 19 DOF
soccer robot from team NimbRo. Jupp is 60cm tall and weights
only 2.3kg. Each leg has a 3 DOF hip, a knee, and a 2 DOF
ankle joint. A pitch joint allows for bending the trunk, just
above the hip. The arms have 2 DOF shoulders and an elbow
joint. All joints are driven by RC-servo motors.
Most parts of the skeleton are made of rectangular alu-
minum tubes. The arms and the feet are made of carbon
composite. A Pocket PC, located in the upper trunk of the
robot, is the main computer. It generates target positions for all
joints at a rate of 83Hz. Jupp is equipped with two gyroscopes
(ADXRS, ± 150/300
/s) that are located in the robot’s trunk.
Two force sensing resistors (FSR) are attached to each heel.
They are used to measure foot contact with the ground. See
[13] for more hardware details.
IV. OPEN-LOOP GAIT ENGINE
The starting point for our work is a clock-driven, open-loop
gait developed by Behnke [13]. The gait is driven by a central
clock π φ
Trunk
< π that determines the step frequency ψ.
Each leg derives its gait phase φ
Leg
by shifting the trunk phase:
φ
Leg
= φ
Trunk
+Λ
π
2
. The leg sign Λ is Λ = 1 for the left leg
and Λ = 1 for the right leg. To abstract from the individual
joints, we implemented a kinematic interface that allows for
changing leg extension γ, leg angle θ
Leg
= (θ
r
Leg
, θ
p
Leg
, θ
y
Leg
),
and foot angle θ
Foot
= (θ
r
Foot
, θ
p
Foot
). Superscripts r, p, and y
indicate the roll, pitch, and yaw direction, respectively.
Each leg generates sinusoidal trajectories for this inter-
face from its phase φ
Leg
. Although the original gait is
omnidirectional, we consider here only walking in forward
direction. The key ingredients of the open-loop gait are:
shifting the weight from one leg to the other, shortening
the leg not needed for support, and leg motion in walking
direction. The gait speed and direction can be varied while
the robot is walking by adjusting the leg swing amplitude
a
Swing
= (a
r
Swing
, a
p
Swing
, a
y
Swing
). The robot starts walking by
first fading in the lateral weight shift until it is stepping on
the spot and then fading in a
Swing
. The open-loop gait engine
has a set of parameters that have been adapted to the robot.
Many of them have an intuitive interpretation.
V. FEEDBACK CONTROL
Jupp reached a top speed of 22.5 cm/s using the hand-tuned
open-loop gait. In order to achieve higher speeds, we augment
the open-loop gait with two feedback control mechanisms
using readings from the two gyroscopes and the FSRs.
A. Gyroscope Feedback
If the angular velocities measured by the gyroscopes ω
r/p
Gyro
become too large, the robot is likely to loose balance. In
order to induce torque in the opposite direction to the possible
fall, we use a P-controller to modify the foot angles θ
r/p
Foot
to
θ
r/p
FootGyro
. We want the angular velocity of the upper body to
be zero. Thus, the gyroscope P-controllers with gains K
r/p
are
θ
r
FootGyro
= θ
r
Foot
+ Λ · K
r
· ω
r
Gyro
and
θ
p
FootGyro
= θ
p
Foot
+ K
p
· ω
p
Gyro
.
B. Phase Reset
For fast and stable walking, it is essential that the frequency
of the central pattern generator inside the robot and the natural
frequency of the robot-environment system are tuned to each
other. While it is possible to find a fixed step frequency that
is on average tuned to the system, disturbances require the
online-adaptation of the step duration.
In order to achieve an entrainment between the internal
clock and the robot-environment system, we reset the trunk
phase φ
Trunk
after the foot hits the ground. When contact of
the foot with the ground is sensed by the FSRs at time t
FC
,
the difference between the current trunk phase and the nominal
start of the next step Φ
Reset
is calculated:
Reset
= Φ
Reset
φ
Trunk
at t = t
FC
.

Fig. 1. Phase Resetting: After the foot hits the ground at t
FootContact
, the
trunk phase φ
Trunk
changes by
Reset
, distributed over a time T
Trans
.
In order to avoid discontinuities in the joint trajectories, the
required phase change
Reset
is distributed over a transient
time T
Trans
. Let φ
Old
be the continued course of φ
Trunk
if it
was not reset and φ
Target
= φ
Trunk
+
Reset
be the course of
the phase after an immediate jump of
Reset
. As illustrated in
Fig. 1, φ
Trunk
linearly fades from φ
Old
to φ
Target
:
φ
Trunk
= φ
Old
+ (t t
FC
)
Reset
T
Trans
if t
FC
< t < t
FC
+ T
Trans
.
As the phases of all other body parts are derived from the
trunk phase, they reset as well. While our resetting mechanism
changes the timing of the walking motion, the trajectory
generation procedure is not changed.
VI. STOCHASTIC OPTIMIZATION
Both, the open-loop gait engine and the proposed feedback
mechanisms have parameters that need to be adjusted to
produce fast and stable walking patterns. As we do not have
a good model of the robot, we follow a model-free approach.
A fitness function f (x), f : R
N
R, expresses our opti-
mization goals: speed and robustness. This function evaluates
a parameter vector x = (x
1
, x
2
, . . . , x
N
) through a walking
experiment, which is called a trial. The robot starts from a
standing position and walks for a certain distance d
exp
while
using the gait parameters x. The quality of the gait is judged
using the time needed to reach the desired distance. In case
the gait is unstable and the robot falls, the distance traveled
is used. Because we prefer slower robust gaits over faster
unstable ones, we formulate the fitness function in a way that
any fall-free trial receives a higher score than a fall.
The speed v of the robot is divided by a maximum
speed v
max
, which will not be exceeded. In case the robot
falls, this relative speed is multiplied with the relative distance
traveled (d/d
exp
). Otherwise, the robot receives an extra bonus
of +1 for stability:
f(d, v) =
(
v
v
max
·
d
d
exp
if d < d
exp
v
v
max
+ 1 else.
The optimization problem is now to find the parameter
vector x
with the highest fitness:
x
= arg max
x
f(x).
The optimization procedure faces several difficulties. First,
f is not a deterministic function, but the fitness of a trial is
a random variable that depends on disturbances encountered.
Evaluating the same parameter vector twice might result in
completely different fitness values, e.g. if the robot falls in
one trial and walks stable in the other. To reduce the variance
of the fitness, we evaluate each parameter vector three times,
if not stated otherwise. Another difficulty is that the fitness
function is not necessarily smooth. Slight parameter changes
can make a stable gait pattern fall. Hence, common gradient
descent methods can not be applied for optimizing f. Finally,
the individual parameters are highly dependent on each other,
which makes it impossible to tune them individually, and the
mapping from parameters to the fitness is quite non-linear.
Various metaheuristics have been proposed in the literature to
tackle such difficult optimization problems.
VII. METAHEURISTICS
We evaluated two state-of-the-art metaheuristics for suit-
ability to our task: Policy Gradient Reinforcement Learning
(PGRL), introduced by Kohl and Stone [6], and Particle
Swarm Optmiziation (PSO) by Kennedy and Eberhart [14].
Both metaheuristics have been successfully applied to similar
optimization problems in the past [6], [15].
A. Policy Gradient Reinforcement Learning
In PGRL, each parameter vector is considered as an open-
loop policy that can be executed by the robot. The algorithm
randomly generates B test policies {x
1
, x
2
, . . . , x
B
} around
policy x
π
, which is to be improved. The parameters x
i
j
of the
test policies x
i
are set randomly to either x
π
j
, to x
π
j
, or to
x
π
j
+ , for all 1 i B and 1 j N . The disturbance
is a small constant value.
The test policies x
i
are executed. The obtained fitness
samples are grouped into three categories for each dimension
j: S
j
, S
0
j
and S
+
j
, depending on whether the jth parameter
of x
i
is modified by , 0, or +. For each set, the average
fitness z
j
, z
0
j
and z
+
j
is computed and an adjustment vector
a = (a
1
, a
2
, . . . , a
N
) is constructed as follows:
a
j
=
0 if z
0
j
> z
+
j
and z
0
j
> z
j
,
z
+
j
z
j
otherwise.
The adjustment a is normalized to a scalar step-size η and
added to x
π
:
x
π
x
π
+ η
a
|a|
.
PGRL continues to test the vicinity of the adjusted policy x
π
for possible improvements in the next iteration.

Extensions to the PGRL Algorithm
1) Adaptive Step Size: To implement a coarse-to-fine
search, we adapt the step size η. We start with a step size
η
max
. After g steps, we decrease η linearly to η
min
:
η =
(
η
max
if s < g
η
max
(sg ) ·(η
max
η
min
)
Sg
else.
Here, s counts the number of fitness function evaluations and S
denotes the maximum allowed number of function evaluations.
2) Sequential Sampling: Instead of sampling the fitness
function for each test policy x
i
three times (through three
walking trials), we apply an approach proposed by Branke
and Schmidt [16] that does not fix the number of samples in
advance. Instead, samples are generated one at a time until
a certain level of confidence for the fitness is achieved, or a
maximum number of samples M have been generated. The
target level of confidence is determined by comparing the
fitness of a test policy x
i
to that of the policy x
π
, which
is to be improved. If the fitnesses of the two policies are very
different, only few samples suffice for estimating the direction
of the gradient of the fitness function. For policies with similar
fitness values more samples are required to guarantee a certain
level of confidence. The sequential sampling procedure is
summarized in Algorithm 1.
Algorithm 1 Sequential Sampling Procedure
Input: Average policy fitness z
π
, test policy x
i
, threshold
for fitness difference ν, max. no. of samplings M
Output: Average fitness z
i
of policy x
i
, number of sam-
plings used m
z
i
fitness obtained by sampling policy x
i
once
d |z
i
z
π
|
for m = 1 to M 1 do
if |d| > ν then
return (z
i
, m)
else
z
i
fitness obtained by sampling x
i
one more time
z
i
(z
i
· m + z
i
)/(m + 1)
d |z
i
z
π
|
end if
end for
return (z
i
, m)
B. Particle Swarm Optimization
Particle Swarm Optimization (PSO) works on a set (swarm)
of parameter vectors (particles) x
i
. Each particle has a velocity
v
i
= (v
i
1
, v
i
2
, . . . , v
i
N
). The velocity indicates how much the
value of the corresponding parameter changes and thus the
position of the particle, in the next iteration of the algorithm.
It is altered according to the best point in parameter space
a particle has visited so far, p
i
= (p
i
1
, p
i
2
, . . . , p
i
N
), and the
global best point PSO has found so far, p
g
:
v
i
j
w · v
i
j
+ c
1
· χ
1
· (p
i
j
x
i
j
) + c
2
· χ
2
· (p
g
j
x
i
j
).
Here, parameters w, c
1
and c
2
influence the balance between
local and global exploration. The weights χ
1/2
add stochas-
ticity to the search. They are drawn uniformly from [0, 1].
The positions of the particles are updated accordingly:
x
i
x
i
+ v
i
.
Extension to PSO
To test for convergence of the algorithm, we sum up the
distances of all J particles to the current global best position:
q
PSO
=
J
X
j=1
|x
j
p
g
|.
When q
PSO
is smaller than a threshold τ
PSO
, the algorithm
has converged and only small improvements can be expected
by continuing the search. We then restart the algorithm by
generating a new set of randomly generated particles.
VIII. EXPERIMENTS IN A PHYSICS-BASED SIMULAT ION
The simulator used in this work is based on the Open
Dynamics Engine (ODE), a library for simulating rigid body
dynamics [17]. For the simulation experiments the experimen-
tal distance is set to d
exp
= 4.5m. One trial is complete when
the robot has walked this distance in any direction
1
or if it
falls beforehand.
In order to be able to transfer the optimization procedure
to the real robot, we limit the number of trials (evaluations
of the fitness function) to 1000. We call 1000 trials an
episode. A configuration for the learning problem includes a
metaheuristic, parameters of the metaheuristic (e.g. step size,
number of start individuals), and a parameter set subject to
optimization.
When comparing different configurations, we do not only
consider the overall best (fittest) parameter vector encountered
during an episode. To obtain a better characterization of the
optimization performance, we obtain a distribution of 1000
fitness values as follows: We evaluate each configuration 10
times. From each of the 10 episodes, we take the 10 best
individuals. We then evaluate these 100 individuals 10 times.
We compare the resulting distributions using notched box plots
which can be viewed as a visualization of a t-test.
Because the number of evaluations of the fitness function is
limited both in simulation and in the real-world experiments,
not all parameters of the gait can be optimized simultaneously.
To avoid the curse of dimensionality, we select nine open-loop
trajectory generation parameters and four feedback parameters
that we consider to be relevant candidates for optimization. For
an overview of the parameters, see Table I. In order to avoid
scaling problems, the ranges of parameter values considered
for optimization are normalized to the interval [0, 1].
1
In the real-world experiments presented in Section IX, we use a more
restrictive goal formulation.

TABLE I
OPTIMIZED PARAME TERS
Open-Loop Parameters
ψ step frequency
a
Swing
swing amplitude of the leg in forward direction
a
Shift
amplitude for lateral weight shifting
λ
FootSwing
partial balance of the leg swing with the foot angle
λ
FootPitch
general tilt of the robot in sagittal direction
a
FootPitch
oscillates the robot in sagittal direction with every step
ρ
FootPitch
phase shift of the a
FootPitch
oscillation
λ
LegSpread
offset to the lateral leg angle
a
Arms
amplitude of the arm movement
Feedback Control Parameters
K
r
gain of lateral gyro feedback
K
p
gain of sagittal gyro feedback
Φ
Reset
nominal trunk phase for phase resetting
T
Trans
transition time for phase resetting
Parameters used for the real-world experiment
RandomSearch PGRL 1 PGRL 2 PSO 1 PSO 2
0.9
1
1.1
1.2
1.3
1.4
1.5
1.6
Fitness
Configuration
Fig. 2. Comparing metaheuristics. See text for configuration descriptions.
A. Comparing PGRL and PSO Metaheuristics
In this section, we compare two metaheuristics and their
variants in order to find the most adequate one for our task. We
use a uniform random search as baseline. All metaheuristics
optimize the parameter set
P
1
= {ψ, a
Robot
, a
Shift
, λ
FootPitch
, K
r
, K
p
, Φ
Reset
, T
Trans
}.
The hand-tuned parameters x
init
are used to initialize PGRL.
We evaluate two different configurations of extended PGRL.
Both configurations use a threshold for selecting additional
samples in the sequential sampling procedure of ν = 0.015
and generate B = 10 test policies with a disturbance of =
0.015. The step size of the first configuration (PGRL 1) falls
from η
max
= 0.7 to η
min
= 0.1 after g = 100 steps. The
second configuration (PGRL 2) uses η
min
= 0.3.
We evaluate two configurations of the PSO. The first PSO
configuration (PSO 1) works on J = 10 particles. Both, c
1
and
c
2
are set to 1. The second configuration (PSO 2) works on
J = 20 particles with c
1
= c
2
= 0.5. For both configurations,
a restart threshold τ
PSO
= 0.3 and a weight momentum of
w = 0.3 are used.
The box plots for comparing these six configurations are
depicted in Figure 2. All tested metaheuristics perform better
than the random search.
none SS SS,AS PR PR,SS PR,SS,AS
0.4
0.6
0.8
1
1.2
1.4
1.6
Fitness
Configuration
Fig. 3. Relevance of phase resetting (PR), sequential sampling (SS), and an
adaptive step size (AS).
When comparing the two PGRL variants, the configuration
with the larger target step size η
min
= 0.3 (PGRL 2) outper-
forms the one with the smaller step size η
min
= 0.1 (PGRL
1), indicating that η
min
= 0.1 is chosen too small. The PSO
configurations find both a best individual with a very high
fitness (1.61 and 1.6). PSO 2 performs better than PSO 1.
The results of the PSO configurations should be taken with
a grain of salt: When the PSO has converged, the particles are
concentrated within a small region of the search space. Thus,
taking the 10 best individuals from one episode comes close
to taking the same (best) individual ten times. As PGRL does
not suffer from this problem and PGRL 2 performed nearly
as good as the best PSO configuration, we decided to use
PGRL 2 in further simulation experiments and for the real-
world experiment.
B. Relevance of phase resetting, sequential sampling and an
adaptive step size
To show the relevance of phase resetting, sequential sam-
pling and an adaptive step size, we compare six different
configurations. The first two configurations neither use se-
quential sampling nor an adaptive step size. The second two
configurations make use of sequential sampling, but do not
adapt the step size. The third two configurations use both,
sequential sampling and an adaptive step size. One of the two
configurations always uses phase resetting and the other one
doesn’t.
Box plots for all six configurations are shown in Fig. 3.
The configurations using phase resetting (PR) outperform the
configurations without phase resetting in all three cases. This
indicates that the phase reset mechanism is essential to achieve
fast and robust locomotion.
It can also be seen that configurations using sequential
sampling (SS) outperform configurations without it and that
configurations using additionally an adaptive step size (AS)
outperform corresponding configurations without it. This in-
dicates that both, sequential sampling and an adaptive step size
improve the PGRL algorithm and lead to better results.

Citations
More filters
Proceedings ArticleDOI

The humanoid museum tour guide Robotinho

TL;DR: This paper presents the mobile full-body humanoid tour guide robot Robotinho, designed and enhanced according to the questionnaires filled out by the people who interacted with the robot at previous public demonstrations.
Journal ArticleDOI

Versatile, high-quality motions and behavior control of a humanoid soccer robot

TL;DR: Although based on a "minimalistic" design which only uses gyroscopes in the hip but not foot-ground contact sensors for control of balance, versatile and high-quality walking motions have been developed and experimental results are reported.
Book ChapterDOI

Kicking a ball - modeling complex dynamic motions for humanoid robots

TL;DR: A motion engine that translates motions into joint angles by using trajectories is presented, defined as a set of Bezier curves that can be changed online to allow adjusting, for example, a kicking motion precisely to the actual position of the ball.
Proceedings ArticleDOI

Learning reliable and efficient navigation with a humanoid

TL;DR: This paper presents a reinforcement learning approach to select appropriate navigation actions for a humanoid robot equipped with a camera for localization and demonstrates that the learned policy significantly outperforms a hand-optimized navigation strategy.
Proceedings ArticleDOI

Learning full body push recovery control for small humanoid robots

TL;DR: This work presents a practical hierarchical push recovery strategy that can be readily implemented on a wide range of humanoid robots and demonstrates effective full body push recovery behaviors during dynamic walking.
References
More filters
Proceedings ArticleDOI

Particle swarm optimization

TL;DR: A concept for the optimization of nonlinear functions using particle swarm methodology is introduced, and the evolution of several paradigms is outlined, and an implementation of one of the paradigm is discussed.
Journal ArticleDOI

Passive dynamic walking

TL;DR: The dynamics are most clearly demonstrated by a machine powered only by gravity, but they can be combined easily with active energy input to produce efficient and dextrous walking over a broad range of terrain.
Journal ArticleDOI

Zero-moment point — thirty five years of its life

TL;DR: The paper gives an in-depth discussion of source results concerning ZMP, paying particular attention to some delicate issues that may lead to confusion if this method is applied in a mechanistic manner onto irregular cases of artificial gait, i.e. in the case of loss of dynamic balance of a humanoid robot.
Proceedings ArticleDOI

Policy gradient reinforcement learning for fast quadrupedal locomotion

Nate Kohl, +1 more
TL;DR: A machine learning approach to optimizing a quadrupedal trot gait for forward speed using a form of policy gradient reinforcement learning to automatically search the set of possible parameters with the goal of finding the fastest possible walk.
Proceedings Article

Automatic gait optimization with Gaussian process regression

TL;DR: A Bayesian approach based on Gaussian process regression that addresses all three drawbacks of local optimization procedures, using a global search strategy based on a posterior model inferred from all of the individual noisy evaluations.
Related Papers (5)