scispace - formally typeset
Search or ask a question
Journal ArticleDOI

The Transferability Approach: Crossing the Reality Gap in Evolutionary Robotics

TL;DR: The transferability approach is proposed, a multiobjective formulation of ER in which two main objectives are optimized via a Pareto-based multiobjectives evolutionary algorithm: 1) the fitness; and 2) the transferability, estimated by a simulation-to-reality (STR) disparity measure.
Abstract: The reality gap, which often makes controllers evolved in simulation inefficient once transferred onto the physical robot, remains a critical issue in evolutionary robotics (ER). We hypothesize that this gap highlights a conflict between the efficiency of the solutions in simulation and their transferability from simulation to reality: the most efficient solutions in simulation often exploit badly modeled phenomena to achieve high fitness values with unrealistic behaviors. This hypothesis leads to the transferability approach, a multiobjective formulation of ER in which two main objectives are optimized via a Pareto-based multiobjective evolutionary algorithm: 1) the fitness; and 2) the transferability, estimated by a simulation-to-reality (STR) disparity measure. To evaluate this second objective, a surrogate model of the exact STR disparity is built during the optimization. This transferability approach has been compared to two reality-based optimization methods, a noise-based approach inspired from Jakobi's minimal simulation methodology and a local search approach. It has been validated on two robotic applications: 1) a navigation task with an e-puck robot; and 2) a walking task with a 8-DOF quadrupedal robot. For both experimental setups, our approach successfully finds efficient and well-transferable controllers only with about ten experiments on the physical robot.

Summary (8 min read)

Introduction

  • EVOLUTIONARY ROBOTICS (ER) [39], [46] deals withthe use of Evolutionary Algorithms (EA) in robotics.
  • This fitness function links each evaluated solution to a value that reflects its efficiency on the task to achieve and, as ER concerns robots, it should theoretically be computed on the studied robot [16].
  • This transfer problem is called reality gap [29] and is arguably the most critical issue that currently prevents the use of ER for practical robotic applications.
  • The authors first insight concerns the simulation models: even if a simulation model is somehow inaccurate, it also contains 2 realistic parts as it is designed to accurately mimic some physical phenomena.
  • For a given controller, this transferability measure compares the corresponding real and simulated behaviors and becomes an objective to optimize during the optimization while looking for efficient controllers.

A. Reality-based optimization

  • As the reality gap results from inadequacies between the reality and the simulation, a first attempt to deal with this problem consists in evolving the solutions directly on the real device.
  • Pollack et al. [50] proposed an alternative that partly allows to tackle high computational cost of optimizing in reality.
  • First, the robot’s morphology and its controller were co-evolved with a realistic simulation.
  • Nolfi et al. reported a similar work regarding a navigation task addressed by a mobile Khepera robot with 30000 evaluations in simulation followed by 3000 evaluations on the physical robot [47].
  • Such approaches assume that the optimal solutions found with the simulation model are relatively close to the true optimal ones on the real robot, i.e. that the high values of the fitness function in simulation are not too misleading.

B. Simulation-based optimization

  • The prohibitive computational cost of direct optimization on physical robots has led some researchers to envisage full optimization processes in simulation [53].
  • Simulation models often are trade-offs between accuracy and computational cost: although the reality gap problem highlights the need of accurate simulators, accurate models can lead to very high computational costs, which are incompatible with optimization techniques.
  • The unwanted phenomena are hidden in an envelope of noise or not modeled at all so that the evolved solutions cannot exploit them and have to be robust enough to achieve high fitness values.
  • The robot can also explicitly build an approximate model of its environment to use it as a reference and then adapt to environment variations.
  • Whether the robustness is obtained by the optimization process in simulation or by some adaptive mechanisms, all these approaches rely on the following hypothesis: the level of robustness is sufficient to overcome the reality gap.

C. Robot-in-the-loop simulation-based optimization

  • The robot-in-the-loop simulation-based optimization approaches also rely mostly on simulators but some transfer experiments are allowed during the optimization.
  • This approach has been successfully implemented with a fourlegged robot [8].
  • Also based on co-evolution between simulators and controllers, the Back-to-Reality algorithm [58] does not resort to a disagreement measure, but tries to reduce the fitness variation observed between simulation and reality.
  • The optimization process can itself directly rely on a socalled surrogate model by evaluating the individuals with a simple model of the fitness function instead of building an entire simulation model.
  • Abbeel et al. notably applied such techniques to aerobatic helicopter flight [2].

D. Concluding thoughts

  • This state-of-the-art on the reality gap problem leads us to five main thoughts:.
  • For all practical purposes, simulation models are often available when working on robotic applications and while a simulation model can lead to reality gap problems, it is also designed to properly describe the dynamics of a given system: it probably contains both accurate parts and inaccurate ones.
  • The authors main idea is to base the optimization on a simulation model that remains fixed during the whole process.
  • The approach then looks for the most efficient controllers whose behaviors are sufficiently based on the realistic parts of the simulation model to transfer well onto the real robot.
  • The authors do not build a simulation model from scratch nor modify it, but they rather exploit an already available simulator where it mimics the reality at most.

A. Principles

  • The Transferability approach fits into the robot-in-the-loop simulation-based optimization approaches.
  • As the Transferability approach aims at finding solutions both efficient in simulation and transferable from simulation to reality, it does not always find the optimal solutions in reality, but rather good compromises between efficiency in simulation and transferability.
  • If the optimal solutions in reality indeed rely on unrealistic parts of the simulation, as illustrated on the Fig. 3, the approach will consequently avoid them, because they are not transferable.
  • Based on 5 this observation, this case was never encountered in the two experiments studied in this paper.

B. From an exact STR disparity to a surrogate model

  • D∗ links, for any possible controller c, the corresponding behavior in simulation b(c) in the behavior space B, to its exact STR disparity value D∗(b(c)).
  • The authors rely on a so-called surrogate model to approximate the STR disparity function during the optimization process.
  • 6 Surrogate models [23], [31], [57] are usually resorted to in real engineering problems when evaluating an individual on the target system means very high computation costs or too long experiments.
  • Such interpolation methods rely on a distance function to compare solutions: the value predicted for a given solution mostly depends on the exact values of solutions that are close to it.

C. Optimization scheme

  • Evaluation objectives: Each controller is evaluated by three objectives: 1. the task-dependent fitness, to find good controllers; 2. the corresponding approximated STR disparity com- puted with the surrogate model, to find transferable controllers; 3. the behavioral diversity objective.
  • This last objective allows to maintain behavioral diversity among the population, which efficiently enhances exploration of the controller state space [15], [42].
  • In such a context, the update heuristic defined earlier boils down to randomly selecting one individual among those whose diversity value is higher than the diversity threshold τdiv .
  • It ensures that any new experiment selected by the update heuristic is meaningful.

D. Algorithm outline

  • To initialize the surrogate model of the STR disparity, the authors assume that a controller c0 has already been transferred onto the real system at the beginning of each optimization process.
  • The corresponding exact STR disparity value D∗(c0) and the behavioral features in simulation are computed.
  • In order to transfer different enough behaviors from those corresponding to the already transferred controllers and then to limit the number of experiments, the update heuristic relies on a diversity threshold τdiv: one controller, randomly selected among those in the current population whose behavioral diversity value is greater than τdiv , is transferred.
  • The diversity threshold is designed by hand to achieve a given number of transfer experiments on average during the whole optimization process.
  • It is next used to update the surrogate model of the STR disparity function.

E. Best solution of a run

  • The authors assume at first that a threshold D∗threshold on the STR disparity values can be empirically chosen in such a way 8 that STR disparity values greater than D∗threshold empirically means bad transfers.
  • For the class A1, the goal of the optimization process boils down to find an optimal individual in reality, that is a controller which solves the task.
  • The same criteria are used for applications of the class A2.
  • There are two possible cases: if the transferable non-dominated set is empty, the best solution of the run is the solution with the lowest STR disparity in the non-dominated set, although it should not transfer well; otherwise, the authors have to choose a best compromise solution in this transferable set.
  • The authors approach has been validated with an e-puck robot on one of Jakobi’s early experiments on the reality gap problem (class A1, [26]).

A. Experimental set-up

  • The authors first application aims to reproduce one of Jakobi’s experiments on the reality gap problem [26], [27], notably to compare their approach with Jakobi’s one that nowadays remains the most formalized methodology dedicated to this problem.
  • Nevertheless, the light sensors of the e-puck robot appear not to be reliable enough and their experimental set-up is slightly different from the original one.
  • Detects this wall in simulation, its sensors have to be noised so that the optimal behaviors found in simulation cannot exploit it and fail when transferred onto the real device (see Fig. 9).
  • The genotype encodes 7 parameters for each of the 5 “left” neurons: 1) its threshold value (16 values regularly spaced from -1 to 1); 2) the destination neuron of its 3 possible outgoing connections (integer values from 1 to 16)6; 3) the 3 weights corresponding to these 3 connections (16 values regularly spaced from -2 to 2).
  • The fitness values are averaged on both test cases to compute the global fitness value of an individual.

B. Problems encountered when implementing Jakobi’s approach

  • In order to obtain controllers that transfer well from simulation to reality, Jakobi argues that if the authors look for robust enough individuals in simulation, they should transfer well onto the real device and also be robust in reality.
  • Here, the authors only consider the reality gap problem and the concerns on robustness in reality are not especially evaluated.
  • The infrared sensor values can indeed dramatically deviate from an experiment to another, as well as the duration of the color pattern detection by the camera.
  • Preliminary experiments with such optimization schemes did not lead to individuals with high fitness or high robustness in simulation with the budget of evaluations fixed for the set-up.
  • 7The maximal wheel speed in Jakobi’s original setup was 8 cm per second and the simulation was updated 10 times per second.

C. Approaches

  • 1) Noise-based approach inspired from Jakobi’s one:.
  • The individuals are optimized in the simulation with the real parameter values.
  • Moreover, in order to transfer as few individuals as possible, only individuals that are optimal in simulation are transferred onto the e-puck robot during the optimization process.
  • The diversity objective is computed as the minimal Hamming distance based on the binary genotype8 to the already transferred controllers.
  • The best solution of a run is the transferred controller with the highest fitness value.

D. Results

  • Table I sums up the location of the evaluation step (simulation, reality or both) along with the number of experiments done on the physical robot by run for each approach.
  • All the approaches have been implemented using the stateof-the-art MOEA NSGA-II [14], based on non-dominated sorting and elitist tournament selection.
  • The two reality-based optimization approaches, evolution on the physical robot and surrogate modelling of the real fitness, achieve clearly worse results, with respective average fitness values of 469 mm (sd = 45 mm) and 466 mm (sd = 77 mm).
  • Concerning the noise-based approach, the original results obtained in [26] are not reproduced: only 3 runs out of 10 lead to optimal solutions in reality, while the method always worked in the original set-up.
  • A typical behavior obtained in reality with the Transferability approach is pictured on Fig. 12.

V. APPLICATION II: QUADRUPEDAL WALKING ROBOT

  • Locomotion problems have often been addressed in Evolutionary Robotics.
  • In particular, quadrupedal walking offers the advantage of various kinds of gaits: from static and easy to model walks to more dynamic and complex ones.
  • As these gaits do not need the same level of accuracy to be correctly modeled in simulation, they are expected to achieve different transferability performances on the real device.
  • The fitness is the distance covered by the robot during a fixed time.
  • Contrary to the previous application, the optimality of a given solution cannot be directly derived from 14 the corresponding behavior whether in simulation or on the real robot (class A2 on the Fig. 6), as the maximal robot speed is unknown.

A. Robot and experimental set-up

  • The physical robot is made from a Bioloid Kit and has been built after the wheeled-legged robot Hylos [20] designed for autonomous planetary/volcanic exploration.
  • Each leg then includes an upper leg motor and a lower leg motor, all controlled in position.
  • The authors also use a simulator relying on the Bullet Physics Library, an open source physics engine [5].
  • For their application, the following points have been carefully modeled: dimensions of the robot, masses of the different parts, mass asymmetry of the main body, contact areas of the wheels, servos’ built-in controller (according to the Dynamixel documentation).
  • The fitness landscape in simulation is complex as shown on Figure 14.

B. Approaches

  • The exact STR disparity measure is based on the real and simulated distances from the origin √ x2 + y2 of the robot’s geometric center that are computed respectively from the recorded real and simulated trajectories for each sampled data point.
  • To build the surrogate model, a given number of selected controllers have to be transferred onto the physical robot to record the corresponding fitness values in reality.
  • 16 The diversity objective is computed as the minimal Euclidean distance based on the genotype to the already transferred controllers.
  • The best solution of a run is the transferred controller with the highest fitness value.
  • As the surrogate model builds the relation between the two control parameters and the covered distance in reality, five preliminary experiments are needed to initialize the Kriging model.

C. Results

  • The Table III sums up the location of the evaluation step (simulation, reality or both) along with the number of experiments done on the physical robot by run for each approach.
  • The Control approach leads to better results, possibly because it does not always find the true optimal solutions in simulation, which is quite in agreement with the antagonism the authors hypothesize between efficiency and transferability.
  • In order to study more in details the reality gap problem for this application, the best individuals found with the Control approach +.
  • Such results show that the real fitness landscape of this application is likely to be simple 16We select the same diversity threshold τdiv = 0.1 as with the Transferability approach, which leads to archives CS of 11 individuals on average (sd = 1).the authors.the authors.
  • The best trade-off individual among the best solutions obtained with the Transferability approach achieves 1132 mm in the simulation and 1099 mm on the real robot with a 0.004 STR disparity value17.

VI. FURTHER INVESTIGATIONS

  • In place of transfers from simulation to reality, the authors solve a fictive reality gap problem between a simplified simulator and the accurate simulator used in the previous section.
  • They only differ from each other by the modeling of the servos’ built-in controller.
  • The simple simulator relies on a proportional relation between the speed and the position error, while the accurate one is based on the Dynamixel documentation.

A. Concerning the surrogate model

  • The graph 19 shows for all the individuals on the last non-dominated sets of each run the corresponding approximated STR disparity values and the corresponding exact STR disparity values.
  • It is highly linked to the main drawback of the Inverse Distance Weighting interpolation technique: the predicted value always lies between the minimum and the maximum of the interpolated data points.
  • The Pearson’s correlation coefficients between the approximated STR disparity and the exact one are relatively high, with an average of 0.76 (sd = 0.11).
  • It indicates that there is a strong positive monotonic relation between the approximation and the exact function.
  • Such considerations often are sufficient to conclude that the surrogate model is of good quality [25]: the surrogate model seems to provide the evolutionary search with a good gradient.

B. Concerning the behavioral distances

  • In the Transferability approach, two behavioral distances are used: (1) the transferability measure that compares simulated behaviors with real ones; (2) the in silico metric that only compares behaviors in simulation.
  • The bFeat+DTraj variant corresponds to the original approach.
  • In the application with the quadrupedal robot, the three behavioral features separate behaviors that are efficient or not (covered distance), that make the robot overturns or do not (mean height), that are more or less stable (final orientation).
  • Such results validate their original approach with a featurebased behavioral distance as in silico metric and a trajectorybased behavioral distance as transferability measure.

C. Concerning the update heuristic and the diversity objective

  • The authors now implement a second set of variants (table V) depending on: (1) the update heuristic used to choose the transfer experiments; (2) the presence or lack of the diversity objective.
  • The “random” update heuristic consists in choosing at random the controller to transfer among those whose diversity value is greater than the diversity threshold τdiv as used in the original approach.
  • All the results are shown on the Fig. 21.
  • The variants RandomT & Div and MaxDivT & Div behave well, but the RandomT & Div variant shows best results, as it looks for better tradeoff solutions.
  • It means, counter-intuitively, that transferring the most different controller from the already transferred ones is not ideal.

A. Antagonism between efficiency and transferability

  • In their approach, controllers are evaluated in simulation by a task-dependent fitness and a STR disparity value.
  • This antagonism has to be discussed according to the results obtained in both set-ups.
  • Diversity on the application with the quadrupedal walking robot.
  • The former approach behaves clearly better in reality than the latter one.
  • Using a soft constraint based on the STR disparity value could provide an alternative to multiobjective optimization.

B. Towards an on-board transferability measure

  • In both applications, the transferability measure relies on external information: the trajectory of the robot recorded with CODA cx1 scanners.
  • Nevertheless, it is only meaningful if the sensor values are accurately modeled in simulation, sometimes despite significant amounts of noise.
  • It is also argued in [7] that accurate quantitative comparisons between two sensor time series is difficult because small initial disparities can quickly lead to very different signals, which can lead to prefer external measures for physical robots [8].
  • Another promising way is to exploit sensorimotor informations to obtain an accurate estimation of the trajectory by sensor integration [4], [9].
  • It is sometimes pertinent to combine different types of sensors (short-range distance sensors with a camera for instance) by periodic repositioning of the estimation.

C. Modeling the fitness or the transferability

  • The use of surrogate models is increasing in robotics, most of the time to directly approximate the fitness function on the physical robot.
  • Such approximations try to map the relation between the parameters of the controller and the fitness in reality by interpolating a global function from few experimental data with Kriging-like methods.
  • Another issue concerns the number of parameters introduced by Kriging methods.
  • For realistic landscapes, several transfer experiments will be needed for exploring and building a sufficiently accurate surrogate model to avoid local maxima without any prior knowledge.
  • In fact, selecting one of these two approaches comes down to the availability of relevant simulation models and, in practice, simulation models are often available for robotic applications.

D. Upgrading the simulation from the STR disparity measure

  • At the end of a run performed with the Transferability approach, the obtained surrogate model of the STR disparity function gives a rough landscape of which parts of the simulation are not well-modeled and which parts are realistic.
  • It then is possible to use clustering methods to notably extract which kinds of behaviors are more or less linked to bad transferability values.
  • Depending on the complexity of the problem, the next step, which consists to understand how the simulation model makes these behaviors non-transferable and finally to improve the model, must be conducted by interacting with experts in robotics and mechanics.

VIII. CONCLUSION AND FURTHER WORK

  • This paper addressed the reality gap problem in the case of controller optimization, a critical issue in Evolutionary Robotics, which often happens when resorting to simulators.
  • Controllers are evaluated by 3 main objectives in a multi-objective manner: a task-dependent fitness and a simulation-to-reality disparity that estimates controller’s transferability using a surrogate model.
  • Better results were achieved by the Transferability approach regarding both exact STR disparity and covered distance in reality with very few transfer experiments during the optimization.
  • A second application to an 8-DOF quadrupedal walking robot has also been investigated and their approach again finds controllers that are relevant regarding a walking task and that transfer well to reality.
  • Each simulation model is a compromise between accuracy and speed.

Did you find this useful? Give us your feedback

Figures (26)

Content maybe subject to copyright    Report

HAL Id: hal-00687617
https://hal.archives-ouvertes.fr/hal-00687617
Submitted on 15 Apr 2012
HAL is a multi-disciplinary open access
archive for the deposit and dissemination of sci-
entic research documents, whether they are pub-
lished or not. The documents may come from
teaching and research institutions in France or
abroad, or from public or private research centers.
L’archive ouverte pluridisciplinaire HAL, est
destinée au dépôt et à la diusion de documents
scientiques de niveau recherche, publiés ou non,
émanant des établissements d’enseignement et de
recherche français ou étrangers, des laboratoires
publics ou privés.
The Transferability Approach: Crossing the Reality Gap
in Evolutionary Robotics
Sylvain Koos, Jean-Baptiste Mouret, Stéphane Doncieux
To cite this version:
Sylvain Koos, Jean-Baptiste Mouret, Stéphane Doncieux. The Transferability Approach: Crossing the
Reality Gap in Evolutionary Robotics. IEEE Transactions on Evolutionary Computation, Institute
of Electrical and Electronics Engineers, 2012, pp.1-25. �10.1109/TEVC.2012.2185849�. �hal-00687617�

1
The Transferability Approach: Crossing
the Reality Gap in Evolutionary Robo tics
Sylvain Koos, Jean-Ba ptiste Mouret and St´ephane Doncieux
Abstract—The reality gap, that often makes controllers evolved
in simulation ineffi cient once transferred onto the physical robot,
remains a critical issue in Evolutionary Robotics (ER). We hy-
pothesize t hat this gap high lights a conflict between the efficiency
of the solutions in simulation and their transferability from
simulation to reality: the most efficient solutions in si mulation
often expl oit badly modeled phenomena to achieve high fit ness
values wit h unrealistic behaviors. This hypothesis leads to the
Transferability approach, a multi-objective formulation of ER
in which two main objectives are optimized via a Pareto-based
Multi-Objective Evolutionary Algorithm: (1) the fitness and (2)
the transferability, estimated b y a simulation-to-reality (STR)
disparity measure. To evaluate thi s second objective, a surrogate
model of the exact STR disparity is built during the optimization.
This Transferability approach has been compared to two reality-
based optimization methods, a noise-based approach inspired
from Jakobi’s minimal simulation methodology and a local search
approach. It has been validated on two robotic applications: 1)
a navigation task with an e-puck robot; 2) a walking task with
an 8-DOF quadrupedal robot. For both experimental set-ups,
our approach successfully finds efficient and well-transferable
controllers only with about ten experiments on the physical robot.
I. INTRODUCTION
E
VOLUTIONARY ROBOTICS (ER) [39], [46] deals with
the use of Evolutionary Algorithms (EA) in robotics.
Such algorithms ar e indeed attractive black-box optimization
methods that put only few constraints on the optimal behavior
by relying on a fitness function to compare the potential
solutions. This fitness function links each evaluated solution to
a value that reflects its efficiency on the task to a chieve and, as
ER concerns robots, it should theoretically be computed on the
studied robot [16]. In practice, each evaluation on a physical
device can be very time-consu ming. Besides, as the behavior
that corresponds to a given solution is not known before
its evaluation, harmful behaviors can be transferred onto th e
robot. Consequen tly, the few works in which controllers have
been direc tly evolved on the rob ot often optimize d few indi-
viduals during few generations, which reduces the efficiency of
the evolutionary methods. For instance in [19], controllers for
a small helicopte r have been evolved with a po pulation of 20
individuals during 30 generations, with few minutes between
generations to avoid over-heating, that is only 600 evaluations
during the optimization process. In [52], optimization has
directly been app lied to a prototype o rnithopter machin e to
maximize its lift with 30 00 evaluations on the physical system
during the optimization, which seems more consistent with
Sylvain Koos, Jean-Baptiste Mouret and St´ephane Doncieux are with the
ISIR, CNRS UMR 7222, Universit´e Pierre et Marie Curie, F-75005, Paris,
France. Contact: koos@isir.upmc.fr
evolutionary techniques, but other optimization tasks would
require several tens of thousands of evaluations [15].
For these several reasons, simulation models ar e an ap-
pealing way to evaluate the fitness in a fully secure set-up,
while sig nificantly speedin g up the optimization process [22].
Accurate simulators can be even slower than experiments in
reality, which lead to prohibitively long optimization pro-
cesses. To obtain simulation models with lower computational
costs, it is sometim es necessary to neglect some complex
physical phenomena, which leads to simpler simulators, of
course less accurate, but also faster. The dynamics of the robot
can also not be fully known: for instance , bird-size UAVs
or small helicopters bring into play little-known dynamics,
which leads to approximate simulation models. Con sequently,
the dynamic model of the robot used to build a simu la tion
model can itself be inaccurate. Whe n the fitness is computed
in simulation , the evolutionary proce ss is likely to explo it such
inaccuracies between th e simulation model and the reality in
an opportunistic manner to achieve high fitness values with
unrealistic behaviors. In practice, even if many works in ER
are successful to build non-trivial and efficient controller s
that correspond to original and complex behaviors [51], [55],
these attractive results are often locked in the simula te d world
because of b ad transfers from simulation to reality. This
transfer problem is called reality gap [29] and is arguably the
most critical issue that currently prevents the u se of ER for
practical robotic applications. For instance, numerous r eality
gap problems have bee n reported w hen applying an EA to a
12-DOF bipedal walking robot [48]. It should be note d that the
reality gap problem is not specific to ER, as any optim iz ation
method based on a simulation model encounters reality gap
issues (for instance in [3] whe n de signing control structures
for a quadrotor helicopter ). A gap can even exist whe n the
controllers are directly evolved on the real system, if the
experimental set-up which allows to evaluate the individuals
is too different f rom th e real environment of the rob ot. It ha s
notably been observed in [19] on a small helicopter.
The goal of this work is to introduce the Transferability
approa c h, a general methodology to help crossing the rea lity
gap and to bring Evolutionary Robotics and simula tors back
together. This approach aims at:
finding controllers that are both relevant for a given task
in simulation and transferable from simulation to reality;
conducting as few experimen ts as possible on the physical
robot during the optimization pr ocess.
Our first insight concerns the simulation models: even if
a simulation model is somehow inaccurate, it also contains

2
realistic parts as it is designed to accurately mimic some
physical phenomena. E fficient behaviors that mainly rely on
these realistic parts of the simulation model should transfer
pretty well onto the physical device and th en achieve good
performances in reality.
A controller is said transferab le if the co rresponding be-
haviors of the robot obser ved in simulation a nd in reality ar e
similar. Our approach takes into ac count the transfer quality
of the evaluated co ntrollers under the form of a transferability
measure. For a given controller, this transf erability measure
compare s the corresponding real and simu la te d behaviors and
becomes an objective to optimize during the optimization
while loo king for efficient controllers. As solutions that behave
at best in simulation frequently exploit bugs or badly modeled
pheno mena, making them not transfe rable, transferability and
efficiency appear to be conflicting objectives. In order to look
for relevant trade-off solutions, we then propose to optimize
solutions with a Pareto-ba sed Multi-Objective Evolutionary
Algorithm (MOEA) in which two objectives a re defined: a
task-dependent fitness computed in simulation only and a
transferability objective.
To estimate the transferability of a given controller from
simulation to reality, we introduce a simulation-to-reality
disparity (STR disparity) measure that evaluates the disparities
between the corresponding simulated and real behaviors of the
robot: the higher the STR disparity, th e worse th e transfer abil-
ity. However, as the number of transfers has to be minimized,
the exact STR disparity value for each controller cannot
be obtained. Consequently, we build during the optimization
process a surrogate model that approximates the STR disp a rity
function with function interpolation techniques.
Preliminary results have been obtained with the Transfer-
ability approach with an 8 -DOF wheeled-legged quadrupedal
robot on a walking task [35]. I n the cur rent paper, the approach
is additionally applied to a navigation task in a T-maze [26]
and both application s allow systematic comparisons between
the Transferability approach and state-o f-the-art methods.
After presenting some pr evious work on the reality gap
problem, we introdu ce the Transferability appro a ch in a
robotic context. The a pproach is next validated on two robotic
applications (cf. above) and compared to two robot-based
optimization method s, a noise-based app roach inspired from
Jakobi’s minimal simulation methodo logy and also a local
search approach. Thre e main aspects are next investigated, no-
tably regarding the quality of the appr oximated STR disparity,
before discussing the underlying hypotheses of the approach.
II. PREVIOUS WORK ON THE REALITY GAP PROBLEM
Several works deal with the reality gap proble m and one
can d istinguish three main types o f approaches: (1) reality-
based optimization approaches where optimiza tion takes place,
fully or partly, on the physical robo t; (2) simulation-based
optimization approaches with an entire optimization process
in simulation; (3) robot-in-the-loop simulation-based op timiza-
tion approaches, that fully optimize solutions in simulation, but
also allow few transfer experiments during the process.
A. Reality-based optimization
As the reality gap results from inadeq uacies between the
reality and the simulation, a first attempt to deal with this
problem consists in evolving the solutions directly on the
real device. Such experiments have been done in [16] with
a Khepera mobile robot, to find robust controllers that can
adapt to the variations encountered by the robot during a
navigation task in a maze: environment, batter y lifetime,
. . . The optimization took more than 60 hour s with about
8000 evaluations on the physical robot, while the task seems
relatively simple. As a case in point, similar approaches have
been implemented on Sony AIBO robots [24], [33]
1
and on a
nine-legged robot [59].
Pollack et al. [50] proposed an alternative that partly allows
to tackle high computational cost of optimizing in reality. As
part of the GOLEM project, one of whose goals con sists in
co-evolving morphologies and controllers, the solutions were
mostly evolved in simulation and only the last generations of
the optimization process were conducted on rea l robots. First,
the robot’s m orphology and its c ontroller were co-evolved with
a realistic simulatio n. Next, an embodied evolution took place
on a po pulation of physical robots with the best morphology to
overcome the reality ga p. Nolfi et al. reported a similar work
regarding a navigation task addressed b y a mobile Khepera
robot with 30000 evaluations in simulation followed by 3000
evaluations on the ph ysical r obot [47].
Such approaches assume that the optimal solutions found
with the simulation model are relatively close to th e true
optimal ones on the real robot, i.e. that the high values of
the fitness functio n in simulation are not too misleading. Con-
sequently, a local search around the controllers seen as optim a l
in simulation should be sufficient to retrieve ne ar optimal
controllers in reality. This assumption is clearly debatable
when the optimal solutions in simulation achieve hig h fitness
values because of inaccuracies or bad modeled dynamical
pheno mena.
B. Simulatio n-based optimization
The prohibitive computational cost of direct optim iz a tion
on physical robots has led so me researchers to envisage full
optimization processes in simulation [53]. A first attempt to
deal with the reality gap is to build more accurate simulation
models. If all physical phenomena are well-mode led, the re
should not be any significant gap between simulation and re-
ality. However, simulation models often are trade-offs between
accuracy and co mputational cost: although the reality gap
problem highlights the need of accurate simulators, accurate
models can lead to very high computational costs, which are
incompatible with optimization techniques. It has notably been
underlined in [11] with visua lly guided robots. Besides, some
kinds of robots rely on little-known dynamics like bird-sized
unmanned aerial vehicles [38] or small helicopters. For such
devices, perfect simulations are still out of reach.
To cope with not fully accurate simulation mode ls, some
techniques have been d eveloped in order to evolve controllers
1
About 1000 experiments in 3 hours in [33].

3
that exhibit robust e nough behaviors in simulation or that are
based on robust enough mechanisms to transfer well onto the
real robot. Among such approaches, the most formalized one is
Jakobi’s minimal simulation [26]. It consists in only modeling
meaningful parts of the physical system in relation to the
target behavior by dropping the complex physical phenomena
that are only involved in bad or unstable beh aviors. The
unwanted phenomena are hidden in an envelope of noise
or not modeled at all so that the evolved solutions cannot
exploit them and have to be ro bust enough to achieve high
fitness values. This approach has been successfully applied to
design walking gaits for an octopod robot [28]. The robustness
can also be achieved by optimizing the potential solutions
with several minimal simulation models whose parameters are
slightly varied from a generation to another. It h a s be e n applied
to a navigation task with a Khepera mobile robot in a T-
maze [27]. Similar robustness issues have been investigated
in [40], also with Kh e pera mobile robots: by only choosing
the most realistic amount of noise to add on the simulated
sensor values, the transfer from simulation to reality did not
lead to any pe rformance loss. The robustness of the optimized
behaviors can also be obtained by on ly evaluating the solutions
with several simulation environments and initial conditions as
in [56].
Other approaches deal with the reality gap as a variation of
the environment that can be overcome online by some adap-
tive mechanisms. In [17], plastic neural network controllers
have been used to integrate several sub-behaviors and also
to overcome a gap when a solution is transferred onto the
real device by online adaptation to the “new” environment.
There was no clear separation between simulation and reality.
The r obustness is then directly linked to the mechanism of
plasticity. The robot can also explicitly build an ap proximate
model of its environment to use it as a r e ference and then adapt
to environmen t variations. For instance in [21], an anticipation
module allowed to build a model of the motor consequences
in the simulated environment. If some differences are en-
countere d once in reality between this model and the current
environment, a correction mo dule performs online adaptation
to improve the behavior and overcome the gap.
Whether the robustness is obtain ed by the optimization
process in simulation or by some adaptive mechanisms, a ll
these approaches rely on the following hypothesis: the level
of robustness is sufficient to overcom e the r eality gap. In
Jakobi’s methodology, it can be quite tricky to find which
parameters have to be changed and from which amount to
variate them. Besides, one can wonder if adaptive mechanisms
are always able to retrieve the global optimum in reality
from the global optimum in simulation. If the real behavior
correspo nding to the optimal controller in simulation differs a
lot from its simulated counterpart, such mechanisms are rather
likely to retrieve local optima in rea lity with significantly
worse performances.
C. Robot-in-the-loop simulatio n-based o ptimization
The robot-in-the-loop simulation-based optimization ap-
proach e s also rely mostly on simulators but some transfer ex-
periments are allowed during the optimization. In [6], Bongard
et al. introduced a c o-evolutiona ry proc ess, the Exploration-
Estimation Algorithm, that evolves two populatio ns: simu-
lators and contro llers. The simulators have to model the
previously observed real data and th e con troller that discrim-
inates at most between these simulators is transferred onto
the real device to generate new meaningful learning data
for the simulation part. This process is iter ated until a good
simulator is found and re levant controllers for a given task
can next be built on it. These simulators allow to speed
up the evaluation of the con trollers, while being upg raded
by conducting some meaningful transfer experiments on the
real device. Moreover, resorting to an upda te heuristic based
on a disagreement measure allows to reduce the num ber of
experiments requ ired to explore efficiently the solution space.
This approach has been successfully implemented with a four-
legged robot [8]. A similar method based on multi-objective
evaluation of the solutions has been applied to a stabilization
task with a simulated quadrotor helicopter [34].
Also based on co-evolution b etween simulators and con-
trollers, the Back-to-Reality algorithm [58] does not resort to
a disagreement measure, but tries to redu c e the fitness variation
observed between simulation and reality. Once the controllers
have sufficiently converged to the best simulator, they are
transferred onto the real robot and the fitness variations of
the individuals that behave at best in reality are used to
evolve better simulators, and so on. As for the Exp loration-
Estimation Algorithm, the co-evolution process ends when a
good simulator and a good controller are found. The app roach
has successfully been applied to a ball-kicking task with a
Sony AIBO robot.
However, such co-evolutionary methods rely on the assump-
tion that the simulation model can become accurate enough
to allow perfect transf e rs with only few experiments. It is
plausible when modeling simple dynamic s or simply a djusting
a few parameters, but de batable for o ptimizations on a wider
search space.
The optimization process can itself directly rely on a so-
called surrogate model by evaluating the individuals with
a simple model of the fitness function instead of building
an entire simulation model. The surrogate model has to be
upgraded during the optimization process by conducting some
test exper iments depending on a given update heuristic. As
a case in point, such an approach has successfully been
applied to fast humanoid locomotion [23]. Without relying
on EA, similar approaches have been applied to reality gap
problems in th e field o f reinforcement learning. Abbeel et
al. notably applied such techniques to aerobatic helicopter
flight [2]. From several trajectories previously made by a p ilot,
they identified a n ap proximate local m odel of the helicopter
dynamics before learning the optimal flight policy which was
next transferred onto the physical device. If the policy d id
not work in reality, the corresponding data obtained in reality
were used to upgrade the local dynamic model and the policy
optimization took place again. The process was iterated until
the optimal policy worked in reality. A similar meth od was
applied in [1] to the autorotation of a remote control helicopter
in case of an engine failur e. Th e results obtained for these
applications ar e quite impressive. However, it can only be

4
applied when a human pilot/operator is able to mimic the task
to solve in order to identify the dynamic model.
D. Concluding thoughts
This state-of-the-art on the reality gap p roblem leads us to
ve main thoughts:
1. Optimizing on the physical robot is an appealing way, but
it leads to slow optimization processes and some risky
behaviors can be transferred;
2. Co mpleting the optimization process by some evaluations
onto the real robot is only mea ningful if the optimal
solutions in simulation are close to the optimal ones in
reality, that is if the reality gap is small enough;
3. There is no guarantee that a ro bust controller only opti-
mized in simulation will be robust enoug h to transfer well
in reality and such a robustness is hardly assessable;
4. Adaptive mechanisms inside the controller structur e may
not efficiently re-adapt to optimal behaviors in reality if
the gap is too strong;
5. To our opinion, the robot-in-the-loop simulation-based
optimization approaches are currently the most promising
ones, but building a perfect simula tor or a meaningful
surrogate model is arguably difficult, especially if it is
built from scratch or improved during the optimizatio n
process as is often the case.
One of the most pivotal point is the use of simulation
models: is it n e cessary to build it from scratch or to improve
it as is often the case in the robot-in-the-loop simulation-
based op timization approaches? For all practical purposes,
simulation models are often available when working on robotic
applications and while a simulation model ca n lead to reality
gap problems, it is also designed to properly describe the
dynamics of a given system: it probably contains both accurate
parts and inaccurate ones. Ou r main idea is to base the
optimization on a simulation mo del that remains fixed during
the whole process. The approach then looks for the most
efficient controllers wh ose behaviors are sufficiently based on
the realistic parts of the simulation model to transfer well onto
the real robot. Consequently, we do not build a simulation
model from scra tch nor modify it, but we ra ther exploit an
already available simulator where it mimics the r eality at
most.
III. THE TRANSFERABILITY APPROACH
A. Principles
The Transferability approach fits into the robot-in-the-loop
simulation-based optimization approaches. The optimization
process relies on a simulator designed once and not impr oved
afterwards. Our first hypothesis is that, despite reality gap
problems, this simulation model is locally reliab le with some
parts accurate enough to ensure goo d transfers to reality as
illustrated on Fig . 1. However, the gradient provided by the
fitness in simulation does not guide the search in the same
direction as the real gradient: the best solutions found in
simulation are not transfera ble to reality an d behave signif-
icantly worse on the robot. Th e Transferability approach aims
at finding efficient solutions that mostly exp loit these well-
modeled parts of the simulator. To evaluate th e quality o f
a given controller’s transfer from simulation to reality, we
rely on a tran sf erability measure that compares a simulated
behavior with its counterpart in reality and quantitatively
reflects their closeness. We secondly hypothesize that the
reality gap mainly stems from a conflict between two aspects:
the efficiency of solutions in simulation and the transferability
of those solutions from simulation to r e ality. It leads to a multi-
objective for mulation of ER in which two main objectives are
optimized via a Pareto-based MOEA: (1) th e task-dependent
fitness; (2) the transfera bility ob jective.
The transferability measure cannot be obtained for each
solution as it means many transfer experiments on the robot.
We claim tha t if the value of th is function is known for a
few selected solutions, that is if a few solutions are trans-
ferred during the optimization, a transferability function c a n
be approximated for a ll the other solutions by interp olation.
This interpolate d transferability objective allows to guide the
evolutionary search towards good compromise solutions, b oth
efficient in simulation and transferable from simulation to
reality. The whole process is pictured on Fig. 2.
In this work, the transferability of a given con troller is
assessed by a simulation-to-reality disparity (STR disparity)
measure, which estimates the dispa rities between the corre-
sponding behaviors respectively o bserved in simulation and
on the physical robot: the higher the STR disparity, the worse
the transferability. As the STR disparity cannot be computed
for each potential solution, we rely on a surrogate model to
approximate this second objective. The surrogate model is
interpolated thanks to a few transfer experiments co nducted
during th e optimization, according to an update he uristic, that
allows to periodically select which contr ollers are the most
meaningful to transfer regar ding the cu rrent surrogate model.
As the Transferability approach aims at finding solutions
both efficient in simulation and transferable from simulation to
reality, it does not always find the o ptimal solutions in reality,
but rather good compromises between efficiency in simulation
and transferability. If the o ptimal solutions in reality indee d
rely on unrealistic parts of the simulation, as illustrated on the
Fig. 3, the approa c h will consequently avoid them, because
they are not transferable. The Transferability approach is
therefore based on the hypothesis, that the realistic par ts of the
simulation include behaviors, whic h are sufficiently relevant to
efficiently address the task. We hypothesize that this situation
is unlikely to occur for realistic robotic applications, because
the mechanical models used as simulations are designed to
model the most important phenomena regarding the task to
solve.
Another case can arguably be problematic. As pictured on
the Fig. 4, the optima l solution in simulation may also be
optimal in reality, while corresponding to a non-tr a nsferable
behavior. As the Transferability approach looks for transfer-
able zones in the simulation, this optimal solution is avoided.
However, th is case can easily be detected, as the fitness value
in reality obtained with th e best solution found with the
Transferability approach should be lower than the fitness value
in reality of the mo st e fficient solution in simu la tion. Based on

Citations
More filters
Proceedings ArticleDOI
20 Mar 2017
TL;DR: This paper explores domain randomization, a simple technique for training models on simulated images that transfer to real images by randomizing rendering in the simulator, and achieves the first successful transfer of a deep neural network trained only on simulated RGB images to the real world for the purpose of robotic control.
Abstract: Bridging the ‘reality gap’ that separates simulated robotics from experiments on hardware could accelerate robotic research through improved data availability. This paper explores domain randomization, a simple technique for training models on simulated images that transfer to real images by randomizing rendering in the simulator. With enough variability in the simulator, the real world may appear to the model as just another variation. We focus on the task of object localization, which is a stepping stone to general robotic manipulation skills. We find that it is possible to train a real-world object detector that is accurate to 1.5 cm and robust to distractors and partial occlusions using only data from a simulator with non-realistic random textures. To demonstrate the capabilities of our detectors, we show they can be used to perform grasping in a cluttered environment. To our knowledge, this is the first successful transfer of a deep neural network trained only on simulated RGB images (without pre-training on real images) to the real world for the purpose of robotic control.

2,079 citations


Additional excerpts

  • ...[22]....

    [...]

Posted Content
TL;DR: In this article, the authors use domain randomization to train a real-world object detector that is accurate to $1.5 cm and robust to distractors and partial occlusions using only data from a simulator with non-realistic random textures.
Abstract: Bridging the 'reality gap' that separates simulated robotics from experiments on hardware could accelerate robotic research through improved data availability. This paper explores domain randomization, a simple technique for training models on simulated images that transfer to real images by randomizing rendering in the simulator. With enough variability in the simulator, the real world may appear to the model as just another variation. We focus on the task of object localization, which is a stepping stone to general robotic manipulation skills. We find that it is possible to train a real-world object detector that is accurate to $1.5$cm and robust to distractors and partial occlusions using only data from a simulator with non-realistic random textures. To demonstrate the capabilities of our detectors, we show they can be used to perform grasping in a cluttered environment. To our knowledge, this is the first successful transfer of a deep neural network trained only on simulated RGB images (without pre-training on real images) to the real world for the purpose of robotic control.

966 citations

Journal ArticleDOI
28 May 2015-Nature
TL;DR: An intelligent trial-and-error algorithm is introduced that allows robots to adapt to damage in less than two minutes in large search spaces without requiring self-diagnosis or pre-specified contingency plans, and may shed light on the principles that animals use to adaptation to injury.
Abstract: An intelligent trial-and-error learning algorithm is presented that allows robots to adapt in minutes to compensate for a wide variety of types of damage. Autonomous mobile robots would be extremely useful in remote or hostile environments such as space, deep oceans or disaster areas. An outstanding challenge is to make such robots able to recover after damage. Jean-Baptiste Mouret and colleagues have developed a machine learning algorithm that enables damaged robots to quickly regain their ability to perform tasks. When they sustain damage — such as broken or even missing legs — the robots adopt an intelligent trial-and-error approach, trying out possible behaviours that they calculate to be potentially high-performing. After a handful of such experiments they discover, in less than two minutes, a compensatory behaviour that works in spite of the damage. Robots have transformed many industries, most notably manufacturing1, and have the power to deliver tremendous benefits to society, such as in search and rescue2, disaster response3, health care4 and transportation5. They are also invaluable tools for scientific exploration in environments inaccessible to humans, from distant planets6 to deep oceans7. A major obstacle to their widespread adoption in more complex environments outside factories is their fragility6,8. Whereas animals can quickly adapt to injuries, current robots cannot ‘think outside the box’ to find a compensatory behaviour when they are damaged: they are limited to their pre-specified self-sensing abilities, can diagnose only anticipated failure modes9, and require a pre-programmed contingency plan for every type of potential damage, an impracticality for complex robots6,8. A promising approach to reducing robot fragility involves having robots learn appropriate behaviours in response to damage10,11, but current techniques are slow even with small, constrained search spaces12. Here we introduce an intelligent trial-and-error algorithm that allows robots to adapt to damage in less than two minutes in large search spaces without requiring self-diagnosis or pre-specified contingency plans. Before the robot is deployed, it uses a novel technique to create a detailed map of the space of high-performing behaviours. This map represents the robot’s prior knowledge about what behaviours it can perform and their value. When the robot is damaged, it uses this prior knowledge to guide a trial-and-error learning algorithm that conducts intelligent experiments to rapidly discover a behaviour that compensates for the damage. Experiments reveal successful adaptations for a legged robot injured in five different ways, including damaged, broken, and missing legs, and for a robotic arm with joints broken in 14 different ways. This new algorithm will enable more robust, effective, autonomous robots, and may shed light on the principles that animals use to adapt to injury.

928 citations

Posted Content
TL;DR: A new algorithm called Go-Explore, which exploits the following principles to remember previously visited states, solve simulated environments through any available means, and robustify via imitation learning, which results in a dramatic performance improvement on hard-exploration problems.
Abstract: A grand challenge in reinforcement learning is intelligent exploration, especially when rewards are sparse or deceptive. Two Atari games serve as benchmarks for such hard-exploration domains: Montezuma's Revenge and Pitfall. On both games, current RL algorithms perform poorly, even those with intrinsic motivation, which is the dominant method to improve performance on hard-exploration domains. To address this shortfall, we introduce a new algorithm called Go-Explore. It exploits the following principles: (1) remember previously visited states, (2) first return to a promising state (without exploration), then explore from it, and (3) solve simulated environments through any available means (including by introducing determinism), then robustify via imitation learning. The combined effect of these principles is a dramatic performance improvement on hard-exploration problems. On Montezuma's Revenge, Go-Explore scores a mean of over 43k points, almost 4 times the previous state of the art. Go-Explore can also harness human-provided domain knowledge and, when augmented with it, scores a mean of over 650k points on Montezuma's Revenge. Its max performance of nearly 18 million surpasses the human world record, meeting even the strictest definition of "superhuman" performance. On Pitfall, Go-Explore with domain knowledge is the first algorithm to score above zero. Its mean score of almost 60k points exceeds expert human performance. Because Go-Explore produces high-performing demonstrations automatically and cheaply, it also outperforms imitation learning work where humans provide solution demonstrations. Go-Explore opens up many new research directions into improving it and weaving its insights into current RL algorithms. It may also enable progress on previously unsolvable hard-exploration problems in many domains, especially those that harness a simulator during training (e.g. robotics).

317 citations


Cites background from "The Transferability Approach: Cross..."

  • ...in robotics), one can then use any of the many available techniques for transferring the robust policy from simulation to the real world [59, 60, 81]....

    [...]

Journal ArticleDOI
05 Feb 2019-Sensors
TL;DR: The physical fundamentals, principle functioning, and electromagnetic spectrum used to operate the most common sensors used in perception systems (ultrasonic, RADAR, LiDAR, cameras, IMU, GNSS, RTK, etc.) are presented.
Abstract: This paper presents a systematic review of the perception systems and simulators for autonomous vehicles (AV). This work has been divided into three parts. In the first part, perception systems are categorized as environment perception systems and positioning estimation systems. The paper presents the physical fundamentals, principle functioning, and electromagnetic spectrum used to operate the most common sensors used in perception systems (ultrasonic, RADAR, LiDAR, cameras, IMU, GNSS, RTK, etc.). Furthermore, their strengths and weaknesses are shown, and the quantification of their features using spider charts will allow proper selection of different sensors depending on 11 features. In the second part, the main elements to be taken into account in the simulation of a perception system of an AV are presented. For this purpose, the paper describes simulators for model-based development, the main game engines that can be used for simulation, simulators from the robotics field, and lastly simulators used specifically for AV. Finally, the current state of regulations that are being applied in different countries around the world on issues concerning the implementation of autonomous vehicles is presented.

268 citations

References
More filters
Journal ArticleDOI
TL;DR: This paper suggests a non-dominated sorting-based MOEA, called NSGA-II (Non-dominated Sorting Genetic Algorithm II), which alleviates all of the above three difficulties, and modify the definition of dominance in order to solve constrained multi-objective problems efficiently.
Abstract: Multi-objective evolutionary algorithms (MOEAs) that use non-dominated sorting and sharing have been criticized mainly for: (1) their O(MN/sup 3/) computational complexity (where M is the number of objectives and N is the population size); (2) their non-elitism approach; and (3) the need to specify a sharing parameter. In this paper, we suggest a non-dominated sorting-based MOEA, called NSGA-II (Non-dominated Sorting Genetic Algorithm II), which alleviates all of the above three difficulties. Specifically, a fast non-dominated sorting approach with O(MN/sup 2/) computational complexity is presented. Also, a selection operator is presented that creates a mating pool by combining the parent and offspring populations and selecting the best N solutions (with respect to fitness and spread). Simulation results on difficult test problems show that NSGA-II is able, for most problems, to find a much better spread of solutions and better convergence near the true Pareto-optimal front compared to the Pareto-archived evolution strategy and the strength-Pareto evolutionary algorithm - two other elitist MOEAs that pay special attention to creating a diverse Pareto-optimal front. Moreover, we modify the definition of dominance in order to solve constrained multi-objective problems efficiently. Simulation results of the constrained NSGA-II on a number of test problems, including a five-objective, seven-constraint nonlinear problem, are compared with another constrained multi-objective optimizer, and the much better performance of NSGA-II is observed.

37,111 citations


"The Transferability Approach: Cross..." refers methods in this paper

  • ...As for the previous application, all the approaches have been implemented using the MOEA NSGA-II [14]14....

    [...]

  • ...All the approaches have been implemented using the stateof-the-art MOEA NSGA-II [14], based on non-dominated sorting and elitist tournament selection.10 For single-objective optimization schemes (Control approaches), NSGA-II is equivalent to an elitist tournament-based EA....

    [...]

  • ...As for the previous application, all the approaches have been implemented using the MOEA NSGA-II [14](14)....

    [...]

  • ...All the approaches have been implemented using the stateof-the-art MOEA NSGA-II [14], based on non-dominated sorting and elitist tournament selection....

    [...]

Journal ArticleDOI
TL;DR: This paper introduces the reader to a response surface methodology that is especially good at modeling the nonlinear, multimodal functions that often occur in engineering and shows how these approximating functions can be used to construct an efficient global optimization algorithm with a credible stopping rule.
Abstract: In many engineering optimization problems, the number of function evaluations is severely limited by time or cost. These problems pose a special challenge to the field of global optimization, since existing methods often require more function evaluations than can be comfortably afforded. One way to address this challenge is to fit response surfaces to data collected by evaluating the objective and constraint functions at a few points. These surfaces can then be used for visualization, tradeoff analysis, and optimization. In this paper, we introduce the reader to a response surface methodology that is especially good at modeling the nonlinear, multimodal functions that often occur in engineering. We then show how these approximating functions can be used to construct an efficient global optimization algorithm with a credible stopping rule. The key to using response surfaces for global optimization lies in balancing the need to exploit the approximating surface (by sampling where it is minimized) with the need to improve the approximation (by sampling where prediction error may be high). Striking this balance requires solving certain auxiliary problems which have previously been considered intractable, but we show how these computational obstacles can be overcome.

6,914 citations


"The Transferability Approach: Cross..." refers methods in this paper

  • ...Some of the most used interpolation methods are: 1) Radial Basis Function [30]; 2) Inverse Distance Weighting model [54]; 3) Kriging model [32]....

    [...]

  • ...Kriging is a group of popular interpolation geostatistical methods [23], [32], somewhat similar to IDW interpolation....

    [...]

Proceedings ArticleDOI
01 Jan 1968
TL;DR: In many fields using empirical areal data there arises a need for interpolating from irregularly-spaced data to produce a continuous surface as discussed by the authors, and it is assumed that a unique number (such as rainfall in meteorology, or altitude in geography) is associated with each data point.
Abstract: In many fields using empirical areal data there arises a need for interpolating from irregularly-spaced data to produce a continuous surface. These irregularly-spaced locations, hence referred to as “data points,” may have diverse meanings: in meterology, weather observation stations; in geography, surveyed locations; in city and regional planning, centers of data-collection zones; in biology, observation locations. It is assumed that a unique number (such as rainfall in meteorology, or altitude in geography) is associated with each data point. In order to display these data in some type of contour map or perspective view, to compare them with data for the same region based on other data points, or to analyze them for extremes, gradients, or other purposes, it is extremely useful, if not essential, to define a continuous function fitting the given values exactly. Interpolated values over a fine grid may then be evaluated. In using such a function it is assumed that the original data are without error, or that compensation for error will be made after interpolation.

3,882 citations

Journal ArticleDOI
Yaochu Jin1
01 Jan 2005
TL;DR: A comprehensive survey of the research on fitness approximation in evolutionary computation is presented, main issues like approximation levels, approximate model management schemes, model construction techniques are reviewed and open questions and interesting issues in the field are discussed.
Abstract: Evolutionary algorithms (EAs) have received increasing interests both in the academy and industry. One main difficulty in applying EAs to real-world applications is that EAs usually need a large number of fitness evaluations before a satisfying result can be obtained. However, fitness evaluations are not always straightforward in many real-world applications. Either an explicit fitness function does not exist, or the evaluation of the fitness is computationally very expensive. In both cases, it is necessary to estimate the fitness function by constructing an approximate model. In this paper, a comprehensive survey of the research on fitness approximation in evolutionary computation is presented. Main issues like approximation levels, approximate model management schemes, model construction techniques are reviewed. To conclude, open questions and interesting issues in the field are discussed.

1,228 citations


"The Transferability Approach: Cross..." refers background or methods in this paper

  • ...Consequently, we do not build a simulation model from scratch nor modify it, but we rather exploit an already available simulator where it mimics the reality at most....

    [...]

  • ...…if the surrogate model is accurate compared to the exact STR disparity measure, we conducted 10 runs of the Transferability approach as described in the section III with a diversity threshold τdiv = 0.05, that corresponds to 25 transfers by run on average from the simple simulator to the…...

    [...]

  • ...Our first application aims to reproduce one of Jakobi’s experiments on the reality gap problem [26], [27], notably to compare our approach with Jakobi’s one that nowadays remains the most formalized methodology dedicated to this problem....

    [...]

  • ...The approach then looks for the most efficient controllers whose behaviors are sufficiently based on the realistic parts of the simulation model to transfer well onto the real robot....

    [...]

01 Jan 2002

927 citations


"The Transferability Approach: Cross..." refers methods in this paper

  • ...The implementation and the use of the Kriging model rely on the DACE Matlab toolbox [36]....

    [...]

Frequently Asked Questions (11)
Q1. What are the contributions mentioned in the paper "The transferability approach: crossing the reality gap in evolutionary robotics" ?

For both experimental set-ups, their approach successfully finds efficient and well-transferable controllers only with about ten experiments on the physical robot. 

As the surrogate model builds the relation between the two control parameters and the covered distance in reality, five preliminary experiments are needed to initialize the Kriging model. 

A classic way to optimize controllers with expensive fitness functions comes down to directly build a surrogate model of the fitness in reality [31], instead of relying on a simulation model: the surrogate model tries to approximate the relation between the control parameters and the real fitness. 

For the T-maze problem, the surrogate model can hardly rely on Kriging interpolation with their budget of evaluations on the robot: the controller depends on 35 parameters, which implies at least 71 experiments to initialize the Kriging model. 

For realistic landscapes, several transfer experiments will be needed for exploring and building a sufficiently accurate surrogate model to avoid local maxima without any prior knowledge. 

The robot-in-the-loop simulation-based optimization approaches also rely mostly on simulators but some transfer experiments are allowed during the optimization. 

Because this approach relies on much more experiments in reality than the other approaches, it has only been repeated 3 times to have the same amount of experiments in reality in total (about 60 experiments for each approach). 

the Control approach leads to better results, possibly because it does not always find the true optimal solutions in simulation, which is quite in agreement with the antagonism the authors hypothesize between efficiency and transferability. 

One could argue that such external measures require heavy/costly experimental set-up which are hardly compatible with bigger robots and that on-board sensorimotor informations should be preferred to compare simulated and real behaviors. 

The optimization took more than 60 hours with about 8000 evaluations on the physical robot, while the task seems relatively simple. 

These simulators allow to speed up the evaluation of the controllers, while being upgraded by conducting some meaningful transfer experiments on the real device.