Journal Article•DOI•

The Transferability Approach: Crossing the Reality Gap in Evolutionary Robotics

Sylvain Koos, J-B Mouret, Stéphane Doncieux

01 Feb 2013-IEEE Transactions on Evolutionary Computation (IEEE)-Vol. 17, Iss: 1, pp 122-145

TL;DR: The transferability approach is proposed, a multiobjective formulation of ER in which two main objectives are optimized via a Pareto-based multiobjectives evolutionary algorithm: 1) the fitness; and 2) the transferability, estimated by a simulation-to-reality (STR) disparity measure.

read less

Abstract: The reality gap, which often makes controllers evolved in simulation inefficient once transferred onto the physical robot, remains a critical issue in evolutionary robotics (ER). We hypothesize that this gap highlights a conflict between the efficiency of the solutions in simulation and their transferability from simulation to reality: the most efficient solutions in simulation often exploit badly modeled phenomena to achieve high fitness values with unrealistic behaviors. This hypothesis leads to the transferability approach, a multiobjective formulation of ER in which two main objectives are optimized via a Pareto-based multiobjective evolutionary algorithm: 1) the fitness; and 2) the transferability, estimated by a simulation-to-reality (STR) disparity measure. To evaluate this second objective, a surrogate model of the exact STR disparity is built during the optimization. This transferability approach has been compared to two reality-based optimization methods, a noise-based approach inspired from Jakobi's minimal simulation methodology and a local search approach. It has been validated on two robotic applications: 1) a navigation task with an e-puck robot; and 2) a walking task with a 8-DOF quadrupedal robot. For both experimental setups, our approach successfully finds efficient and well-transferable controllers only with about ten experiments on the physical robot.

...read moreread less

Summary (8 min read)

Jump to: [Introduction] – [A. Reality-based optimization] – [B. Simulation-based optimization] – [C. Robot-in-the-loop simulation-based optimization] – [D. Concluding thoughts] – [A. Principles] – [B. From an exact STR disparity to a surrogate model] – [C. Optimization scheme] – [D. Algorithm outline] – [E. Best solution of a run] – [A. Experimental set-up] – [B. Problems encountered when implementing Jakobi’s approach] – [C. Approaches] – [D. Results] – [V. APPLICATION II: QUADRUPEDAL WALKING ROBOT] – [A. Robot and experimental set-up] – [B. Approaches] – [C. Results] – [VI. FURTHER INVESTIGATIONS] – [A. Concerning the surrogate model] – [B. Concerning the behavioral distances] – [C. Concerning the update heuristic and the diversity objective] – [A. Antagonism between efficiency and transferability] – [B. Towards an on-board transferability measure] – [C. Modeling the fitness or the transferability] – [D. Upgrading the simulation from the STR disparity measure] and [VIII. CONCLUSION AND FURTHER WORK]

Introduction

EVOLUTIONARY ROBOTICS (ER) [39], [46] deals withthe use of Evolutionary Algorithms (EA) in robotics.
This fitness function links each evaluated solution to a value that reflects its efficiency on the task to achieve and, as ER concerns robots, it should theoretically be computed on the studied robot [16].
This transfer problem is called reality gap [29] and is arguably the most critical issue that currently prevents the use of ER for practical robotic applications.
The authors first insight concerns the simulation models: even if a simulation model is somehow inaccurate, it also contains 2 realistic parts as it is designed to accurately mimic some physical phenomena.
For a given controller, this transferability measure compares the corresponding real and simulated behaviors and becomes an objective to optimize during the optimization while looking for efficient controllers.

A. Reality-based optimization

As the reality gap results from inadequacies between the reality and the simulation, a first attempt to deal with this problem consists in evolving the solutions directly on the real device.
Pollack et al. [50] proposed an alternative that partly allows to tackle high computational cost of optimizing in reality.
First, the robot’s morphology and its controller were co-evolved with a realistic simulation.
Nolfi et al. reported a similar work regarding a navigation task addressed by a mobile Khepera robot with 30000 evaluations in simulation followed by 3000 evaluations on the physical robot [47].
Such approaches assume that the optimal solutions found with the simulation model are relatively close to the true optimal ones on the real robot, i.e. that the high values of the fitness function in simulation are not too misleading.

B. Simulation-based optimization

The prohibitive computational cost of direct optimization on physical robots has led some researchers to envisage full optimization processes in simulation [53].
Simulation models often are trade-offs between accuracy and computational cost: although the reality gap problem highlights the need of accurate simulators, accurate models can lead to very high computational costs, which are incompatible with optimization techniques.
The unwanted phenomena are hidden in an envelope of noise or not modeled at all so that the evolved solutions cannot exploit them and have to be robust enough to achieve high fitness values.
The robot can also explicitly build an approximate model of its environment to use it as a reference and then adapt to environment variations.
Whether the robustness is obtained by the optimization process in simulation or by some adaptive mechanisms, all these approaches rely on the following hypothesis: the level of robustness is sufficient to overcome the reality gap.

C. Robot-in-the-loop simulation-based optimization

The robot-in-the-loop simulation-based optimization approaches also rely mostly on simulators but some transfer experiments are allowed during the optimization.
This approach has been successfully implemented with a fourlegged robot [8].
Also based on co-evolution between simulators and controllers, the Back-to-Reality algorithm [58] does not resort to a disagreement measure, but tries to reduce the fitness variation observed between simulation and reality.
The optimization process can itself directly rely on a socalled surrogate model by evaluating the individuals with a simple model of the fitness function instead of building an entire simulation model.
Abbeel et al. notably applied such techniques to aerobatic helicopter flight [2].

D. Concluding thoughts

This state-of-the-art on the reality gap problem leads us to five main thoughts:.
For all practical purposes, simulation models are often available when working on robotic applications and while a simulation model can lead to reality gap problems, it is also designed to properly describe the dynamics of a given system: it probably contains both accurate parts and inaccurate ones.
The authors main idea is to base the optimization on a simulation model that remains fixed during the whole process.
The approach then looks for the most efficient controllers whose behaviors are sufficiently based on the realistic parts of the simulation model to transfer well onto the real robot.
The authors do not build a simulation model from scratch nor modify it, but they rather exploit an already available simulator where it mimics the reality at most.

A. Principles

The Transferability approach fits into the robot-in-the-loop simulation-based optimization approaches.
As the Transferability approach aims at finding solutions both efficient in simulation and transferable from simulation to reality, it does not always find the optimal solutions in reality, but rather good compromises between efficiency in simulation and transferability.
If the optimal solutions in reality indeed rely on unrealistic parts of the simulation, as illustrated on the Fig. 3, the approach will consequently avoid them, because they are not transferable.
Based on 5 this observation, this case was never encountered in the two experiments studied in this paper.

B. From an exact STR disparity to a surrogate model

D∗ links, for any possible controller c, the corresponding behavior in simulation b(c) in the behavior space B, to its exact STR disparity value D∗(b(c)).
The authors rely on a so-called surrogate model to approximate the STR disparity function during the optimization process.
6 Surrogate models [23], [31], [57] are usually resorted to in real engineering problems when evaluating an individual on the target system means very high computation costs or too long experiments.
Such interpolation methods rely on a distance function to compare solutions: the value predicted for a given solution mostly depends on the exact values of solutions that are close to it.

C. Optimization scheme

Evaluation objectives: Each controller is evaluated by three objectives: 1. the task-dependent fitness, to find good controllers; 2. the corresponding approximated STR disparity com- puted with the surrogate model, to find transferable controllers; 3. the behavioral diversity objective.
This last objective allows to maintain behavioral diversity among the population, which efficiently enhances exploration of the controller state space [15], [42].
In such a context, the update heuristic defined earlier boils down to randomly selecting one individual among those whose diversity value is higher than the diversity threshold τdiv .
It ensures that any new experiment selected by the update heuristic is meaningful.

D. Algorithm outline

To initialize the surrogate model of the STR disparity, the authors assume that a controller c0 has already been transferred onto the real system at the beginning of each optimization process.
The corresponding exact STR disparity value D∗(c0) and the behavioral features in simulation are computed.
In order to transfer different enough behaviors from those corresponding to the already transferred controllers and then to limit the number of experiments, the update heuristic relies on a diversity threshold τdiv: one controller, randomly selected among those in the current population whose behavioral diversity value is greater than τdiv , is transferred.
The diversity threshold is designed by hand to achieve a given number of transfer experiments on average during the whole optimization process.
It is next used to update the surrogate model of the STR disparity function.

E. Best solution of a run

The authors assume at first that a threshold D∗threshold on the STR disparity values can be empirically chosen in such a way 8 that STR disparity values greater than D∗threshold empirically means bad transfers.
For the class A1, the goal of the optimization process boils down to find an optimal individual in reality, that is a controller which solves the task.
The same criteria are used for applications of the class A2.
There are two possible cases: if the transferable non-dominated set is empty, the best solution of the run is the solution with the lowest STR disparity in the non-dominated set, although it should not transfer well; otherwise, the authors have to choose a best compromise solution in this transferable set.
The authors approach has been validated with an e-puck robot on one of Jakobi’s early experiments on the reality gap problem (class A1, [26]).

A. Experimental set-up

The authors first application aims to reproduce one of Jakobi’s experiments on the reality gap problem [26], [27], notably to compare their approach with Jakobi’s one that nowadays remains the most formalized methodology dedicated to this problem.
Nevertheless, the light sensors of the e-puck robot appear not to be reliable enough and their experimental set-up is slightly different from the original one.
Detects this wall in simulation, its sensors have to be noised so that the optimal behaviors found in simulation cannot exploit it and fail when transferred onto the real device (see Fig. 9).
The genotype encodes 7 parameters for each of the 5 “left” neurons: 1) its threshold value (16 values regularly spaced from -1 to 1); 2) the destination neuron of its 3 possible outgoing connections (integer values from 1 to 16)6; 3) the 3 weights corresponding to these 3 connections (16 values regularly spaced from -2 to 2).
The fitness values are averaged on both test cases to compute the global fitness value of an individual.

B. Problems encountered when implementing Jakobi’s approach

In order to obtain controllers that transfer well from simulation to reality, Jakobi argues that if the authors look for robust enough individuals in simulation, they should transfer well onto the real device and also be robust in reality.
Here, the authors only consider the reality gap problem and the concerns on robustness in reality are not especially evaluated.
The infrared sensor values can indeed dramatically deviate from an experiment to another, as well as the duration of the color pattern detection by the camera.
Preliminary experiments with such optimization schemes did not lead to individuals with high fitness or high robustness in simulation with the budget of evaluations fixed for the set-up.
7The maximal wheel speed in Jakobi’s original setup was 8 cm per second and the simulation was updated 10 times per second.

C. Approaches

1) Noise-based approach inspired from Jakobi’s one:.
The individuals are optimized in the simulation with the real parameter values.
Moreover, in order to transfer as few individuals as possible, only individuals that are optimal in simulation are transferred onto the e-puck robot during the optimization process.
The diversity objective is computed as the minimal Hamming distance based on the binary genotype8 to the already transferred controllers.
The best solution of a run is the transferred controller with the highest fitness value.

D. Results

Table I sums up the location of the evaluation step (simulation, reality or both) along with the number of experiments done on the physical robot by run for each approach.
All the approaches have been implemented using the stateof-the-art MOEA NSGA-II [14], based on non-dominated sorting and elitist tournament selection.
The two reality-based optimization approaches, evolution on the physical robot and surrogate modelling of the real fitness, achieve clearly worse results, with respective average fitness values of 469 mm (sd = 45 mm) and 466 mm (sd = 77 mm).
Concerning the noise-based approach, the original results obtained in [26] are not reproduced: only 3 runs out of 10 lead to optimal solutions in reality, while the method always worked in the original set-up.
A typical behavior obtained in reality with the Transferability approach is pictured on Fig. 12.

V. APPLICATION II: QUADRUPEDAL WALKING ROBOT

Locomotion problems have often been addressed in Evolutionary Robotics.
In particular, quadrupedal walking offers the advantage of various kinds of gaits: from static and easy to model walks to more dynamic and complex ones.
As these gaits do not need the same level of accuracy to be correctly modeled in simulation, they are expected to achieve different transferability performances on the real device.
The fitness is the distance covered by the robot during a fixed time.
Contrary to the previous application, the optimality of a given solution cannot be directly derived from 14 the corresponding behavior whether in simulation or on the real robot (class A2 on the Fig. 6), as the maximal robot speed is unknown.

A. Robot and experimental set-up

The physical robot is made from a Bioloid Kit and has been built after the wheeled-legged robot Hylos [20] designed for autonomous planetary/volcanic exploration.
Each leg then includes an upper leg motor and a lower leg motor, all controlled in position.
The authors also use a simulator relying on the Bullet Physics Library, an open source physics engine [5].
For their application, the following points have been carefully modeled: dimensions of the robot, masses of the different parts, mass asymmetry of the main body, contact areas of the wheels, servos’ built-in controller (according to the Dynamixel documentation).
The fitness landscape in simulation is complex as shown on Figure 14.

B. Approaches

The exact STR disparity measure is based on the real and simulated distances from the origin √ x2 + y2 of the robot’s geometric center that are computed respectively from the recorded real and simulated trajectories for each sampled data point.
To build the surrogate model, a given number of selected controllers have to be transferred onto the physical robot to record the corresponding fitness values in reality.
16 The diversity objective is computed as the minimal Euclidean distance based on the genotype to the already transferred controllers.
The best solution of a run is the transferred controller with the highest fitness value.
As the surrogate model builds the relation between the two control parameters and the covered distance in reality, five preliminary experiments are needed to initialize the Kriging model.

C. Results

The Table III sums up the location of the evaluation step (simulation, reality or both) along with the number of experiments done on the physical robot by run for each approach.
The Control approach leads to better results, possibly because it does not always find the true optimal solutions in simulation, which is quite in agreement with the antagonism the authors hypothesize between efficiency and transferability.
In order to study more in details the reality gap problem for this application, the best individuals found with the Control approach +.
Such results show that the real fitness landscape of this application is likely to be simple 16We select the same diversity threshold τdiv = 0.1 as with the Transferability approach, which leads to archives CS of 11 individuals on average (sd = 1).the authors.the authors.
The best trade-off individual among the best solutions obtained with the Transferability approach achieves 1132 mm in the simulation and 1099 mm on the real robot with a 0.004 STR disparity value17.

VI. FURTHER INVESTIGATIONS

In place of transfers from simulation to reality, the authors solve a fictive reality gap problem between a simplified simulator and the accurate simulator used in the previous section.
They only differ from each other by the modeling of the servos’ built-in controller.
The simple simulator relies on a proportional relation between the speed and the position error, while the accurate one is based on the Dynamixel documentation.

A. Concerning the surrogate model

The graph 19 shows for all the individuals on the last non-dominated sets of each run the corresponding approximated STR disparity values and the corresponding exact STR disparity values.
It is highly linked to the main drawback of the Inverse Distance Weighting interpolation technique: the predicted value always lies between the minimum and the maximum of the interpolated data points.
The Pearson’s correlation coefficients between the approximated STR disparity and the exact one are relatively high, with an average of 0.76 (sd = 0.11).
It indicates that there is a strong positive monotonic relation between the approximation and the exact function.
Such considerations often are sufficient to conclude that the surrogate model is of good quality [25]: the surrogate model seems to provide the evolutionary search with a good gradient.

B. Concerning the behavioral distances

In the Transferability approach, two behavioral distances are used: (1) the transferability measure that compares simulated behaviors with real ones; (2) the in silico metric that only compares behaviors in simulation.
The bFeat+DTraj variant corresponds to the original approach.
In the application with the quadrupedal robot, the three behavioral features separate behaviors that are efficient or not (covered distance), that make the robot overturns or do not (mean height), that are more or less stable (final orientation).
Such results validate their original approach with a featurebased behavioral distance as in silico metric and a trajectorybased behavioral distance as transferability measure.

C. Concerning the update heuristic and the diversity objective

The authors now implement a second set of variants (table V) depending on: (1) the update heuristic used to choose the transfer experiments; (2) the presence or lack of the diversity objective.
The “random” update heuristic consists in choosing at random the controller to transfer among those whose diversity value is greater than the diversity threshold τdiv as used in the original approach.
All the results are shown on the Fig. 21.
The variants RandomT & Div and MaxDivT & Div behave well, but the RandomT & Div variant shows best results, as it looks for better tradeoff solutions.
It means, counter-intuitively, that transferring the most different controller from the already transferred ones is not ideal.

A. Antagonism between efficiency and transferability

In their approach, controllers are evaluated in simulation by a task-dependent fitness and a STR disparity value.
This antagonism has to be discussed according to the results obtained in both set-ups.
Diversity on the application with the quadrupedal walking robot.
The former approach behaves clearly better in reality than the latter one.
Using a soft constraint based on the STR disparity value could provide an alternative to multiobjective optimization.

B. Towards an on-board transferability measure

In both applications, the transferability measure relies on external information: the trajectory of the robot recorded with CODA cx1 scanners.
Nevertheless, it is only meaningful if the sensor values are accurately modeled in simulation, sometimes despite significant amounts of noise.
It is also argued in [7] that accurate quantitative comparisons between two sensor time series is difficult because small initial disparities can quickly lead to very different signals, which can lead to prefer external measures for physical robots [8].
Another promising way is to exploit sensorimotor informations to obtain an accurate estimation of the trajectory by sensor integration [4], [9].
It is sometimes pertinent to combine different types of sensors (short-range distance sensors with a camera for instance) by periodic repositioning of the estimation.

C. Modeling the fitness or the transferability

The use of surrogate models is increasing in robotics, most of the time to directly approximate the fitness function on the physical robot.
Such approximations try to map the relation between the parameters of the controller and the fitness in reality by interpolating a global function from few experimental data with Kriging-like methods.
Another issue concerns the number of parameters introduced by Kriging methods.
For realistic landscapes, several transfer experiments will be needed for exploring and building a sufficiently accurate surrogate model to avoid local maxima without any prior knowledge.
In fact, selecting one of these two approaches comes down to the availability of relevant simulation models and, in practice, simulation models are often available for robotic applications.

D. Upgrading the simulation from the STR disparity measure

At the end of a run performed with the Transferability approach, the obtained surrogate model of the STR disparity function gives a rough landscape of which parts of the simulation are not well-modeled and which parts are realistic.
It then is possible to use clustering methods to notably extract which kinds of behaviors are more or less linked to bad transferability values.
Depending on the complexity of the problem, the next step, which consists to understand how the simulation model makes these behaviors non-transferable and finally to improve the model, must be conducted by interacting with experts in robotics and mechanics.

VIII. CONCLUSION AND FURTHER WORK

This paper addressed the reality gap problem in the case of controller optimization, a critical issue in Evolutionary Robotics, which often happens when resorting to simulators.
Controllers are evaluated by 3 main objectives in a multi-objective manner: a task-dependent fitness and a simulation-to-reality disparity that estimates controller’s transferability using a surrogate model.
Better results were achieved by the Transferability approach regarding both exact STR disparity and covered distance in reality with very few transfer experiments during the optimization.
A second application to an 8-DOF quadrupedal walking robot has also been investigated and their approach again finds controllers that are relevant regarding a walking task and that transfer well to reality.
Each simulation model is a compromise between accuracy and speed.

Did you find this useful? Give us your feedback

Figures (26)

Fig. 1. Illustration of the reality gap problem on a fictional 1-dimensional optimization problem. While the simulation model is locally reliable, optimizing the fitness objective lead to non-transferable solutions: there is a significant performance loss when they are transferred onto the robot.

Fig. 3. The optimal solution in reality may correspond to a non-transferable behavior in simulation. In this case, as the Transferability approach looks for transferable solutions, it may not find the optimal solutions in reality. We hypothesize that this situation does not happen for realistic robotic applications, because the mechanical models used as simulations are designed to model the most important phenomena regarding the task to solve.

Fig. 2. Illustration of the Transferability approach. If some points are known in reality, it allows to interpolate an approximation of the transferability function. According to this approximate function, well-modeled parts of the simulation can be distinguished from the unrealistic ones: it is then looked for solutions, which are efficient in simulation and transferable from simulation to reality.

Fig. 10. Left, valid connections encoded in the genotype. Right, full network developed from the genotype.

TABLE I LOCATION OF THE EVALUATION STEP: FULLY IN SIMULATION, FULLY IN REALITY OR PARTLY IN BOTH. AVERAGE NUMBER OF EXPERIMENTS ON THE PHYSICAL ROBOT DURING A RUN FOR EACH APPROACH.

Fig. 13. Quadrupedal walking robot used in our experimental set-up.

Fig. 19. Approximated STR disparity compared to exact STR disparity (right, with log-scale) for the individuals in the Pareto front obtained in 10 runs of the Transferability approach with a diversity thresholds τdiv = 0.05 (584 individuals in all).

Fig. 17. Left, exhaustive fitness landscape in simulation of the application II (covered distance, mm). Right, interpolation of the fitness landscape in reality based on about 5500 transfer experiments. These experiments have mostly been conducted in the zone p1 < 0.6. For higher values of p1, the robot is not able to move efficiently, which systematically leads to low fitness values.

Fig. 4. The optimal solution in simulation may correspond to the optimal solution in reality, despite a significant performance loss when transferring it on the robot. The Transferability approach therefore may avoid the optimal zones in simulation and miss the optimal solutions in reality. However, this case can be detected, as the fitness value in reality obtained with the best solution found with the Transferability approach should be lower than the fitness value in reality of the most efficient solution in simulation. This case was never observed in the two experiments presented in the paper.

TABLE V SECOND SET OF VARIANTS: UPDATE HEURISTIC AND DIVERSITY OBJECTIVE.

Fig. 8. Left, diagram of the sensors used with the e-puck robot. Right, our experimental set-up pictured with the e-puck robot.

Fig. 7. Left, diagram of the sensors used with the Khepera mobile robot in the original set-up. Right, both test cases of the reproduced Jakobi’s experiment pictured with the Khepera mobile robot.

Fig. 9. Left, representation of the T-maze used as minimal simulation; right, photography of the real T-maze with both color patterns

TABLE II RUNNING TIME AND SUCCESS RATE ONTO THE PHYSICAL ROBOT FOR ALL THE APPROACHES COMPUTED ON 10 RUNS. THE PERCENTAGE OF SOLVED SIMULATIONS INDICATES HOW THE BEST SOLUTIONS FOUND WITH EACH APPROACH BEHAVE ON THE 180 POSSIBLE SIMULATIONS USED WITH THE NOISE-BASED SET-UP.

Fig. 12. Left, typical behavior evolved with the Control approach: the transferred behavior is inefficient and the robot does not solve the task. Right, typical behavior evolved with the Transferability approach: the robot solves the task in both test cases. For each approach, the left column (resp. right) of figures corresponds to the case with the color pattern on the left (resp. right).

Fig. 15. Fitness of the best individuals found with the Control approach + Diversity plotted every 5 generations in simulation (top) and in reality (bottom). Error bars indicate one unit of standard deviation.

Fig. 20. Comparison between the behavioral distances: covered distance (mm, left) in the simple simulation and in the accurate one, along with the exact STR disparity values computed on trajectories (right) of the best solutions obtained with each variant. The target numbers of transfers by run on average are written above the variant names. The bFeat+DTraj variant behaves at best. The variant TrajOnly also behaves good with 25 transfers by run but finds less efficient individuals regarding the covered distance in reality (Welch’s t-test p-value = 2 10−4) with 45 transfers by run. All the other variants behave significantly worse. Error bars indicate one unit of standard deviation.

Fig. 11. Left, fitness of the best solutions found with all the approaches. Right, exact disparity values computed for these individuals. Means and standard deviations are computed on 10 runs, except for the evolution on the robot with 3 runs. Error bars indicate one unit of standard deviation. The number of experiments done on the physical robot by run for each approach is indicated in Table I. The Transferability approach behaves at best and always finds best solutions that are both transferable (disparity values lower than D∗ threshold = 0.1) and optimal in reality on the e-puck robot.

Fig. 18. Left, typical gait evolved with the Control approach: inefficient behavior due to slippage effects. Right, typical gait evolved with Transferability approaches: the behavior in reality is similarly efficient as the behavior in simulation.

TABLE III LOCATION OF THE EVALUATION STEP: FULLY IN SIMULATION, FULLY IN REALITY OR PARTLY IN BOTH. AVERAGE NUMBER OF EXPERIMENTS ON THE PHYSICAL ROBOT DURING A RUN FOR EACH APPROACH.

Fig. 21. Comparison between two update heuristics and the lack/presence of the diversity objective: covered distance (mm, left) in the simple simulation and in the accurate one, along with the exact STR disparity values (right) of the best solutions obtained with each variant. The target numbers of transfers by run on average are written above the variant names. For the variants RandomT & Div and MaxDivT & Div, the disparities are lower than the threshold D∗ treshold = 1: the found best solutions show good transferability properties. The RandomT & NoDiv variant behaves clearly worse. Error bars indicate one unit of standard deviation.

Content maybe subject to copyright Report

HAL Id: hal-00687617

https://hal.archives-ouvertes.fr/hal-00687617

Submitted on 15 Apr 2012

HAL is a multi-disciplinary open access

archive for the deposit and dissemination of sci-

entic research documents, whether they are pub-

lished or not. The documents may come from

teaching and research institutions in France or

abroad, or from public or private research centers.

L’archive ouverte pluridisciplinaire HAL, est

destinée au dépôt et à la diusion de documents

scientiques de niveau recherche, publiés ou non,

émanant des établissements d’enseignement et de

recherche français ou étrangers, des laboratoires

publics ou privés.

The Transferability Approach: Crossing the Reality Gap

in Evolutionary Robotics

Sylvain Koos, Jean-Baptiste Mouret, Stéphane Doncieux

To cite this version:

Sylvain Koos, Jean-Baptiste Mouret, Stéphane Doncieux. The Transferability Approach: Crossing the

Reality Gap in Evolutionary Robotics. IEEE Transactions on Evolutionary Computation, Institute

of Electrical and Electronics Engineers, 2012, pp.1-25. �10.1109/TEVC.2012.2185849�. �hal-00687617�

The Transferability Approach: Crossing

the Reality Gap in Evolutionary Robo tics

Sylvain Koos, Jean-Ba ptiste Mouret and St´ephane Doncieux

Abstract—The reality gap, that often makes controllers evolved

in simulation inefﬁ cient once transferred onto the physical robot,

remains a critical issue in Evolutionary Robotics (ER). We hy-

pothesize t hat this gap high lights a conﬂict between the efﬁciency

of the solutions in simulation and their transferability from

simulation to reality: the most efﬁcient solutions in si mulation

often expl oit badly modeled phenomena to achieve high ﬁt ness

values wit h unrealistic behaviors. This hypothesis leads to the

Transferability approach, a multi-objective formulation of ER

in which two main objectives are optimized via a Pareto-based

Multi-Objective Evolutionary Algorithm: (1) the ﬁtness and (2)

the transferability, estimated b y a simulation-to-reality (STR)

disparity measure. To evaluate thi s second objective, a surrogate

model of the exact STR disparity is built during the optimization.

This Transferability approach has been compared to two reality-

based optimization methods, a noise-based approach inspired

from Jakobi’s minimal simulation methodology and a local search

approach. It has been validated on two robotic applications: 1)

a navigation task with an e-puck robot; 2) a walking task with

an 8-DOF quadrupedal robot. For both experimental set-ups,

our approach successfully ﬁnds efﬁcient and well-transferable

controllers only with about ten experiments on the physical robot.

I. INTRODUCTION

VOLUTIONARY ROBOTICS (ER) [39], [46] deals with

the use of Evolutionary Algorithms (EA) in robotics.

Such algorithms ar e indeed attractive black-box optimization

methods that put only few constraints on the optimal behavior

by relying on a ﬁtness function to compare the potential

solutions. This ﬁtness function links each evaluated solution to

a value that reﬂects its efﬁciency on the task to a chieve and, as

ER concerns robots, it should theoretically be computed on the

studied robot [16]. In practice, each evaluation on a physical

device can be very time-consu ming. Besides, as the behavior

that corresponds to a given solution is not known before

its evaluation, harmful behaviors can be transferred onto th e

robot. Consequen tly, the few works in which controllers have

been direc tly evolved on the rob ot often optimize d few indi-

viduals during few generations, which reduces the efﬁciency of

the evolutionary methods. For instance in [19], controllers for

a small helicopte r have been evolved with a po pulation of 20

individuals during 30 generations, with few minutes between

generations to avoid over-heating, that is only 600 evaluations

during the optimization process. In [52], optimization has

directly been app lied to a prototype o rnithopter machin e to

maximize its lift with 30 00 evaluations on the physical system

during the optimization, which seems more consistent with

Sylvain Koos, Jean-Baptiste Mouret and St´ephane Doncieux are with the

ISIR, CNRS UMR 7222, Universit´e Pierre et Marie Curie, F-75005, Paris,

France. Contact: koos@isir.upmc.fr

evolutionary techniques, but other optimization tasks would

require several tens of thousands of evaluations [15].

For these several reasons, simulation models ar e an ap-

pealing way to evaluate the ﬁtness in a fully secure set-up,

while sig niﬁcantly speedin g up the optimization process [22].

Accurate simulators can be even slower than experiments in

reality, which lead to prohibitively long optimization pro-

cesses. To obtain simulation models with lower computational

costs, it is sometim es necessary to neglect some complex

physical phenomena, which leads to simpler simulators, of

course less accurate, but also faster. The dynamics of the robot

can also not be fully known: for instance , bird-size UAVs

or small helicopters bring into play little-known dynamics,

which leads to approximate simulation models. Con sequently,

the dynamic model of the robot used to build a simu la tion

model can itself be inaccurate. Whe n the ﬁtness is computed

in simulation , the evolutionary proce ss is likely to explo it such

inaccuracies between th e simulation model and the reality in

an opportunistic manner to achieve high ﬁtness values with

unrealistic behaviors. In practice, even if many works in ER

are successful to build non-trivial and efﬁcient controller s

that correspond to original and complex behaviors [51], [55],

these attractive results are often locked in the simula te d world

because of b ad transfers from simulation to reality. This

transfer problem is called reality gap [29] and is arguably the

most critical issue that currently prevents the u se of ER for

practical robotic applications. For instance, numerous r eality

gap problems have bee n reported w hen applying an EA to a

12-DOF bipedal walking robot [48]. It should be note d that the

reality gap problem is not speciﬁc to ER, as any optim iz ation

method based on a simulation model encounters reality gap

issues (for instance in [3] whe n de signing control structures

for a quadrotor helicopter ). A gap can even exist whe n the

controllers are directly evolved on the real system, if the

experimental set-up which allows to evaluate the individuals

is too different f rom th e real environment of the rob ot. It ha s

notably been observed in [19] on a small helicopter.

The goal of this work is to introduce the Transferability

approa c h, a general methodology to help crossing the rea lity

gap and to bring Evolutionary Robotics and simula tors back

together. This approach aims at:

• ﬁnding controllers that are both relevant for a given task

in simulation and transferable from simulation to reality;

• conducting as few experimen ts as possible on the physical

robot during the optimization pr ocess.

Our ﬁrst insight concerns the simulation models: even if

a simulation model is somehow inaccurate, it also contains

realistic parts as it is designed to accurately mimic some

physical phenomena. E fﬁcient behaviors that mainly rely on

these realistic parts of the simulation model should transfer

pretty well onto the physical device and th en achieve good

performances in reality.

A controller is said transferab le if the co rresponding be-

haviors of the robot obser ved in simulation a nd in reality ar e

similar. Our approach takes into ac count the transfer quality

of the evaluated co ntrollers under the form of a transferability

measure. For a given controller, this transf erability measure

compare s the corresponding real and simu la te d behaviors and

becomes an objective to optimize during the optimization

while loo king for efﬁcient controllers. As solutions that behave

at best in simulation frequently exploit bugs or badly modeled

pheno mena, making them not transfe rable, transferability and

efﬁciency appear to be conﬂicting objectives. In order to look

for relevant trade-off solutions, we then propose to optimize

solutions with a Pareto-ba sed Multi-Objective Evolutionary

Algorithm (MOEA) in which two objectives a re deﬁned: a

task-dependent ﬁtness computed in simulation only and a

transferability objective.

To estimate the transferability of a given controller from

simulation to reality, we introduce a simulation-to-reality

disparity (STR disparity) measure that evaluates the disparities

between the corresponding simulated and real behaviors of the

robot: the higher the STR disparity, th e worse th e transfer abil-

ity. However, as the number of transfers has to be minimized,

the exact STR disparity value for each controller cannot

be obtained. Consequently, we build during the optimization

process a surrogate model that approximates the STR disp a rity

function with function interpolation techniques.

Preliminary results have been obtained with the Transfer-

ability approach with an 8 -DOF wheeled-legged quadrupedal

robot on a walking task [35]. I n the cur rent paper, the approach

is additionally applied to a navigation task in a T-maze [26]

and both application s allow systematic comparisons between

the Transferability approach and state-o f-the-art methods.

After presenting some pr evious work on the reality gap

problem, we introdu ce the Transferability appro a ch in a

robotic context. The a pproach is next validated on two robotic

applications (cf. above) and compared to two robot-based

optimization method s, a noise-based app roach inspired from

Jakobi’s minimal simulation methodo logy and also a local

search approach. Thre e main aspects are next investigated, no-

tably regarding the quality of the appr oximated STR disparity,

before discussing the underlying hypotheses of the approach.

II. PREVIOUS WORK ON THE REALITY GAP PROBLEM

Several works deal with the reality gap proble m and one

can d istinguish three main types o f approaches: (1) reality-

based optimization approaches where optimiza tion takes place,

fully or partly, on the physical robo t; (2) simulation-based

optimization approaches with an entire optimization process

in simulation; (3) robot-in-the-loop simulation-based op timiza-

tion approaches, that fully optimize solutions in simulation, but

also allow few transfer experiments during the process.

A. Reality-based optimization

As the reality gap results from inadeq uacies between the

reality and the simulation, a ﬁrst attempt to deal with this

problem consists in evolving the solutions directly on the

real device. Such experiments have been done in [16] with

a Khepera mobile robot, to ﬁnd robust controllers that can

adapt to the variations encountered by the robot during a

navigation task in a maze: environment, batter y lifetime,

. . . The optimization took more than 60 hour s with about

8000 evaluations on the physical robot, while the task seems

relatively simple. As a case in point, similar approaches have

been implemented on Sony AIBO robots [24], [33]

and on a

nine-legged robot [59].

Pollack et al. [50] proposed an alternative that partly allows

to tackle high computational cost of optimizing in reality. As

part of the GOLEM project, one of whose goals con sists in

co-evolving morphologies and controllers, the solutions were

mostly evolved in simulation and only the last generations of

the optimization process were conducted on rea l robots. First,

the robot’s m orphology and its c ontroller were co-evolved with

a realistic simulatio n. Next, an embodied evolution took place

on a po pulation of physical robots with the best morphology to

overcome the reality ga p. Nolﬁ et al. reported a similar work

regarding a navigation task addressed b y a mobile Khepera

robot with 30000 evaluations in simulation followed by 3000

evaluations on the ph ysical r obot [47].

Such approaches assume that the optimal solutions found

with the simulation model are relatively close to th e true

optimal ones on the real robot, i.e. that the high values of

the ﬁtness functio n in simulation are not too misleading. Con-

sequently, a local search around the controllers seen as optim a l

in simulation should be sufﬁcient to retrieve ne ar optimal

controllers in reality. This assumption is clearly debatable

when the optimal solutions in simulation achieve hig h ﬁtness

values because of inaccuracies or bad modeled dynamical

pheno mena.

B. Simulatio n-based optimization

The prohibitive computational cost of direct optim iz a tion

on physical robots has led so me researchers to envisage full

optimization processes in simulation [53]. A ﬁrst attempt to

deal with the reality gap is to build more accurate simulation

models. If all physical phenomena are well-mode led, the re

should not be any signiﬁcant gap between simulation and re-

ality. However, simulation models often are trade-offs between

accuracy and co mputational cost: although the reality gap

problem highlights the need of accurate simulators, accurate

models can lead to very high computational costs, which are

incompatible with optimization techniques. It has notably been

underlined in [11] with visua lly guided robots. Besides, some

kinds of robots rely on little-known dynamics like bird-sized

unmanned aerial vehicles [38] or small helicopters. For such

devices, perfect simulations are still out of reach.

To cope with not fully accurate simulation mode ls, some

techniques have been d eveloped in order to evolve controllers

About 1000 experiments in 3 hours in [33].

that exhibit robust e nough behaviors in simulation or that are

based on robust enough mechanisms to transfer well onto the

real robot. Among such approaches, the most formalized one is

Jakobi’s minimal simulation [26]. It consists in only modeling

meaningful parts of the physical system in relation to the

target behavior by dropping the complex physical phenomena

that are only involved in bad or unstable beh aviors. The

unwanted phenomena are hidden in an envelope of noise

or not modeled at all so that the evolved solutions cannot

exploit them and have to be ro bust enough to achieve high

ﬁtness values. This approach has been successfully applied to

design walking gaits for an octopod robot [28]. The robustness

can also be achieved by optimizing the potential solutions

with several minimal simulation models whose parameters are

slightly varied from a generation to another. It h a s be e n applied

to a navigation task with a Khepera mobile robot in a T-

maze [27]. Similar robustness issues have been investigated

in [40], also with Kh e pera mobile robots: by only choosing

the most realistic amount of noise to add on the simulated

sensor values, the transfer from simulation to reality did not

lead to any pe rformance loss. The robustness of the optimized

behaviors can also be obtained by on ly evaluating the solutions

with several simulation environments and initial conditions as

in [56].

Other approaches deal with the reality gap as a variation of

the environment that can be overcome online by some adap-

tive mechanisms. In [17], plastic neural network controllers

have been used to integrate several sub-behaviors and also

to overcome a gap when a solution is transferred onto the

real device by online adaptation to the “new” environment.

There was no clear separation between simulation and reality.

The r obustness is then directly linked to the mechanism of

plasticity. The robot can also explicitly build an ap proximate

model of its environment to use it as a r e ference and then adapt

to environmen t variations. For instance in [21], an anticipation

module allowed to build a model of the motor consequences

in the simulated environment. If some differences are en-

countere d once in reality between this model and the current

environment, a correction mo dule performs online adaptation

to improve the behavior and overcome the gap.

Whether the robustness is obtain ed by the optimization

process in simulation or by some adaptive mechanisms, a ll

these approaches rely on the following hypothesis: the level

of robustness is sufﬁcient to overcom e the r eality gap. In

Jakobi’s methodology, it can be quite tricky to ﬁnd which

parameters have to be changed and from which amount to

variate them. Besides, one can wonder if adaptive mechanisms

are always able to retrieve the global optimum in reality

from the global optimum in simulation. If the real behavior

correspo nding to the optimal controller in simulation differs a

lot from its simulated counterpart, such mechanisms are rather

likely to retrieve local optima in rea lity with signiﬁcantly

worse performances.

C. Robot-in-the-loop simulatio n-based o ptimization

The robot-in-the-loop simulation-based optimization ap-

proach e s also rely mostly on simulators but some transfer ex-

periments are allowed during the optimization. In [6], Bongard

et al. introduced a c o-evolutiona ry proc ess, the Exploration-

Estimation Algorithm, that evolves two populatio ns: simu-

lators and contro llers. The simulators have to model the

previously observed real data and th e con troller that discrim-

inates at most between these simulators is transferred onto

the real device to generate new meaningful learning data

for the simulation part. This process is iter ated until a good

simulator is found and re levant controllers for a given task

can next be built on it. These simulators allow to speed

up the evaluation of the con trollers, while being upg raded

by conducting some meaningful transfer experiments on the

real device. Moreover, resorting to an upda te heuristic based

on a disagreement measure allows to reduce the num ber of

experiments requ ired to explore efﬁciently the solution space.

This approach has been successfully implemented with a four-

legged robot [8]. A similar method based on multi-objective

evaluation of the solutions has been applied to a stabilization

task with a simulated quadrotor helicopter [34].

Also based on co-evolution b etween simulators and con-

trollers, the Back-to-Reality algorithm [58] does not resort to

a disagreement measure, but tries to redu c e the ﬁtness variation

observed between simulation and reality. Once the controllers

have sufﬁciently converged to the best simulator, they are

transferred onto the real robot and the ﬁtness variations of

the individuals that behave at best in reality are used to

evolve better simulators, and so on. As for the Exp loration-

Estimation Algorithm, the co-evolution process ends when a

good simulator and a good controller are found. The app roach

has successfully been applied to a ball-kicking task with a

Sony AIBO robot.

However, such co-evolutionary methods rely on the assump-

tion that the simulation model can become accurate enough

to allow perfect transf e rs with only few experiments. It is

plausible when modeling simple dynamic s or simply a djusting

a few parameters, but de batable for o ptimizations on a wider

search space.

The optimization process can itself directly rely on a so-

called surrogate model by evaluating the individuals with

a simple model of the ﬁtness function instead of building

an entire simulation model. The surrogate model has to be

upgraded during the optimization process by conducting some

test exper iments depending on a given update heuristic. As

a case in point, such an approach has successfully been

applied to fast humanoid locomotion [23]. Without relying

on EA, similar approaches have been applied to reality gap

problems in th e ﬁeld o f reinforcement learning. Abbeel et

al. notably applied such techniques to aerobatic helicopter

ﬂight [2]. From several trajectories previously made by a p ilot,

they identiﬁed a n ap proximate local m odel of the helicopter

dynamics before learning the optimal ﬂight policy which was

next transferred onto the physical device. If the policy d id

not work in reality, the corresponding data obtained in reality

were used to upgrade the local dynamic model and the policy

optimization took place again. The process was iterated until

the optimal policy worked in reality. A similar meth od was

applied in [1] to the autorotation of a remote control helicopter

in case of an engine failur e. Th e results obtained for these

applications ar e quite impressive. However, it can only be

applied when a human pilot/operator is able to mimic the task

to solve in order to identify the dynamic model.

D. Concluding thoughts

This state-of-the-art on the reality gap p roblem leads us to

ﬁve main thoughts:

1. Optimizing on the physical robot is an appealing way, but

it leads to slow optimization processes and some risky

behaviors can be transferred;

2. Co mpleting the optimization process by some evaluations

onto the real robot is only mea ningful if the optimal

solutions in simulation are close to the optimal ones in

reality, that is if the reality gap is small enough;

3. There is no guarantee that a ro bust controller only opti-

mized in simulation will be robust enoug h to transfer well

in reality and such a robustness is hardly assessable;

4. Adaptive mechanisms inside the controller structur e may

not efﬁciently re-adapt to optimal behaviors in reality if

the gap is too strong;

5. To our opinion, the robot-in-the-loop simulation-based

optimization approaches are currently the most promising

ones, but building a perfect simula tor or a meaningful

surrogate model is arguably difﬁcult, especially if it is

built from scratch or improved during the optimizatio n

process as is often the case.

One of the most pivotal point is the use of simulation

models: is it n e cessary to build it from scratch or to improve

it as is often the case in the robot-in-the-loop simulation-

based op timization approaches? For all practical purposes,

simulation models are often available when working on robotic

applications and while a simulation model ca n lead to reality

gap problems, it is also designed to properly describe the

dynamics of a given system: it probably contains both accurate

parts and inaccurate ones. Ou r main idea is to base the

optimization on a simulation mo del that remains ﬁxed during

the whole process. The approach then looks for the most

efﬁcient controllers wh ose behaviors are sufﬁciently based on

the realistic parts of the simulation model to transfer well onto

the real robot. Consequently, we do not build a simulation

model from scra tch nor modify it, but we ra ther exploit an

already available simulator where it mimics the r eality at

most.

III. THE TRANSFERABILITY APPROACH

A. Principles

The Transferability approach ﬁts into the robot-in-the-loop

simulation-based optimization approaches. The optimization

process relies on a simulator designed once and not impr oved

afterwards. Our ﬁrst hypothesis is that, despite reality gap

problems, this simulation model is locally reliab le with some

parts accurate enough to ensure goo d transfers to reality as

illustrated on Fig . 1. However, the gradient provided by the

ﬁtness in simulation does not guide the search in the same

direction as the real gradient: the best solutions found in

simulation are not transfera ble to reality an d behave signif-

icantly worse on the robot. Th e Transferability approach aims

at ﬁnding efﬁcient solutions that mostly exp loit these well-

modeled parts of the simulator. To evaluate th e quality o f

a given controller’s transfer from simulation to reality, we

rely on a tran sf erability measure that compares a simulated

behavior with its counterpart in reality and quantitatively

reﬂects their closeness. We secondly hypothesize that the

reality gap mainly stems from a conﬂict between two aspects:

the efﬁciency of solutions in simulation and the transferability

of those solutions from simulation to r e ality. It leads to a multi-

objective for mulation of ER in which two main objectives are

optimized via a Pareto-based MOEA: (1) th e task-dependent

ﬁtness; (2) the transfera bility ob jective.

The transferability measure cannot be obtained for each

solution as it means many transfer experiments on the robot.

We claim tha t if the value of th is function is known for a

few selected solutions, that is if a few solutions are trans-

ferred during the optimization, a transferability function c a n

be approximated for a ll the other solutions by interp olation.

This interpolate d transferability objective allows to guide the

evolutionary search towards good compromise solutions, b oth

efﬁcient in simulation and transferable from simulation to

reality. The whole process is pictured on Fig. 2.

In this work, the transferability of a given con troller is

assessed by a simulation-to-reality disparity (STR disparity)

measure, which estimates the dispa rities between the corre-

sponding behaviors respectively o bserved in simulation and

on the physical robot: the higher the STR disparity, the worse

the transferability. As the STR disparity cannot be computed

for each potential solution, we rely on a surrogate model to

approximate this second objective. The surrogate model is

interpolated thanks to a few transfer experiments co nducted

during th e optimization, according to an update he uristic, that

allows to periodically select which contr ollers are the most

meaningful to transfer regar ding the cu rrent surrogate model.

As the Transferability approach aims at ﬁnding solutions

both efﬁcient in simulation and transferable from simulation to

reality, it does not always ﬁnd the o ptimal solutions in reality,

but rather good compromises between efﬁciency in simulation

and transferability. If the o ptimal solutions in reality indee d

rely on unrealistic parts of the simulation, as illustrated on the

Fig. 3, the approa c h will consequently avoid them, because

they are not transferable. The Transferability approach is

therefore based on the hypothesis, that the realistic par ts of the

simulation include behaviors, whic h are sufﬁciently relevant to

efﬁciently address the task. We hypothesize that this situation

is unlikely to occur for realistic robotic applications, because

the mechanical models used as simulations are designed to

model the most important phenomena regarding the task to

solve.

Another case can arguably be problematic. As pictured on

the Fig. 4, the optima l solution in simulation may also be

optimal in reality, while corresponding to a non-tr a nsferable

behavior. As the Transferability approach looks for transfer-

able zones in the simulation, this optimal solution is avoided.

However, th is case can easily be detected, as the ﬁtness value

in reality obtained with th e best solution found with the

Transferability approach should be lower than the ﬁtness value

in reality of the mo st e fﬁcient solution in simu la tion. Based on

HTML Viewer

Frequently Asked Questions (11)

Q1. What are the contributions mentioned in the paper "The transferability approach: crossing the reality gap in evolutionary robotics" ?

For both experimental set-ups, their approach successfully finds efficient and well-transferable controllers only with about ten experiments on the physical robot.

Q2. How many preliminary experiments are needed to initialize the Kriging model?

As the surrogate model builds the relation between the two control parameters and the covered distance in reality, five preliminary experiments are needed to initialize the Kriging model.

Q3. What is the way to optimize controllers with expensive fitness functions?

A classic way to optimize controllers with expensive fitness functions comes down to directly build a surrogate model of the fitness in reality [31], instead of relying on a simulation model: the surrogate model tries to approximate the relation between the control parameters and the real fitness.

Q4. How many experiments can be performed to initialize the Kriging model?

For the T-maze problem, the surrogate model can hardly rely on Kriging interpolation with their budget of evaluations on the robot: the controller depends on 35 parameters, which implies at least 71 experiments to initialize the Kriging model.

Q5. How many transfer experiments are needed for realistic landscapes?

For realistic landscapes, several transfer experiments will be needed for exploring and building a sufficiently accurate surrogate model to avoid local maxima without any prior knowledge.

Q6. What is the main reason why the robot-in-the-loop optimization approaches rely?

The robot-in-the-loop simulation-based optimization approaches also rely mostly on simulators but some transfer experiments are allowed during the optimization.

Q7. How many experiments have been repeated to have the same amount of fitness in reality?

Because this approach relies on much more experiments in reality than the other approaches, it has only been repeated 3 times to have the same amount of experiments in reality in total (about 60 experiments for each approach).

Q8. What is the reason why the Control approach leads to better results?

the Control approach leads to better results, possibly because it does not always find the true optimal solutions in simulation, which is quite in agreement with the antagonism the authors hypothesize between efficiency and transferability.

Q9. What is the main argument for the argument that external measures are not compatible with bigger robots?

One could argue that such external measures require heavy/costly experimental set-up which are hardly compatible with bigger robots and that on-board sensorimotor informations should be preferred to compare simulated and real behaviors.

Q10. How many evaluations did the optimization take?

The optimization took more than 60 hours with about 8000 evaluations on the physical robot, while the task seems relatively simple.

Q11. How can a robot be upgraded to a real device?

These simulators allow to speed up the evaluation of the controllers, while being upgraded by conducting some meaningful transfer experiments on the real device.

The Transferability Approach: Crossing the Reality Gap in Evolutionary Robotics

Summary (8 min read)

Introduction

A. Reality-based optimization

B. Simulation-based optimization

C. Robot-in-the-loop simulation-based optimization

D. Concluding thoughts

A. Principles

B. From an exact STR disparity to a surrogate model

C. Optimization scheme

D. Algorithm outline

E. Best solution of a run

A. Experimental set-up

B. Problems encountered when implementing Jakobi’s approach

C. Approaches

D. Results

V. APPLICATION II: QUADRUPEDAL WALKING ROBOT

A. Robot and experimental set-up

B. Approaches

C. Results

VI. FURTHER INVESTIGATIONS

A. Concerning the surrogate model

B. Concerning the behavioral distances

C. Concerning the update heuristic and the diversity objective

A. Antagonism between efficiency and transferability

B. Towards an on-board transferability measure

C. Modeling the fitness or the transferability

D. Upgrading the simulation from the STR disparity measure

VIII. CONCLUSION AND FURTHER WORK

Figures (26)

Citations

Additional excerpts

Cites background from "The Transferability Approach: Cross..."

References

"The Transferability Approach: Cross..." refers methods in this paper

"The Transferability Approach: Cross..." refers methods in this paper

"The Transferability Approach: Cross..." refers background or methods in this paper

"The Transferability Approach: Cross..." refers methods in this paper

Related Papers (5)

Frequently Asked Questions (11)

Q1. What are the contributions mentioned in the paper "The transferability approach: crossing the reality gap in evolutionary robotics" ?

Q2. How many preliminary experiments are needed to initialize the Kriging model?

Q3. What is the way to optimize controllers with expensive fitness functions?

Q4. How many experiments can be performed to initialize the Kriging model?

Q5. How many transfer experiments are needed for realistic landscapes?

Q6. What is the main reason why the robot-in-the-loop optimization approaches rely?

Q7. How many experiments have been repeated to have the same amount of fitness in reality?

Q8. What is the reason why the Control approach leads to better results?

Q9. What is the main argument for the argument that external measures are not compatible with bigger robots?

Q10. How many evaluations did the optimization take?

Q11. How can a robot be upgraded to a real device?