The Transferability Approach: Crossing the Reality Gap in Evolutionary Robotics
Summary (8 min read)
Introduction
- EVOLUTIONARY ROBOTICS (ER) [39], [46] deals withthe use of Evolutionary Algorithms (EA) in robotics.
- This fitness function links each evaluated solution to a value that reflects its efficiency on the task to achieve and, as ER concerns robots, it should theoretically be computed on the studied robot [16].
- This transfer problem is called reality gap [29] and is arguably the most critical issue that currently prevents the use of ER for practical robotic applications.
- The authors first insight concerns the simulation models: even if a simulation model is somehow inaccurate, it also contains 2 realistic parts as it is designed to accurately mimic some physical phenomena.
- For a given controller, this transferability measure compares the corresponding real and simulated behaviors and becomes an objective to optimize during the optimization while looking for efficient controllers.
A. Reality-based optimization
- As the reality gap results from inadequacies between the reality and the simulation, a first attempt to deal with this problem consists in evolving the solutions directly on the real device.
- Pollack et al. [50] proposed an alternative that partly allows to tackle high computational cost of optimizing in reality.
- First, the robot’s morphology and its controller were co-evolved with a realistic simulation.
- Nolfi et al. reported a similar work regarding a navigation task addressed by a mobile Khepera robot with 30000 evaluations in simulation followed by 3000 evaluations on the physical robot [47].
- Such approaches assume that the optimal solutions found with the simulation model are relatively close to the true optimal ones on the real robot, i.e. that the high values of the fitness function in simulation are not too misleading.
B. Simulation-based optimization
- The prohibitive computational cost of direct optimization on physical robots has led some researchers to envisage full optimization processes in simulation [53].
- Simulation models often are trade-offs between accuracy and computational cost: although the reality gap problem highlights the need of accurate simulators, accurate models can lead to very high computational costs, which are incompatible with optimization techniques.
- The unwanted phenomena are hidden in an envelope of noise or not modeled at all so that the evolved solutions cannot exploit them and have to be robust enough to achieve high fitness values.
- The robot can also explicitly build an approximate model of its environment to use it as a reference and then adapt to environment variations.
- Whether the robustness is obtained by the optimization process in simulation or by some adaptive mechanisms, all these approaches rely on the following hypothesis: the level of robustness is sufficient to overcome the reality gap.
C. Robot-in-the-loop simulation-based optimization
- The robot-in-the-loop simulation-based optimization approaches also rely mostly on simulators but some transfer experiments are allowed during the optimization.
- This approach has been successfully implemented with a fourlegged robot [8].
- Also based on co-evolution between simulators and controllers, the Back-to-Reality algorithm [58] does not resort to a disagreement measure, but tries to reduce the fitness variation observed between simulation and reality.
- The optimization process can itself directly rely on a socalled surrogate model by evaluating the individuals with a simple model of the fitness function instead of building an entire simulation model.
- Abbeel et al. notably applied such techniques to aerobatic helicopter flight [2].
D. Concluding thoughts
- This state-of-the-art on the reality gap problem leads us to five main thoughts:.
- For all practical purposes, simulation models are often available when working on robotic applications and while a simulation model can lead to reality gap problems, it is also designed to properly describe the dynamics of a given system: it probably contains both accurate parts and inaccurate ones.
- The authors main idea is to base the optimization on a simulation model that remains fixed during the whole process.
- The approach then looks for the most efficient controllers whose behaviors are sufficiently based on the realistic parts of the simulation model to transfer well onto the real robot.
- The authors do not build a simulation model from scratch nor modify it, but they rather exploit an already available simulator where it mimics the reality at most.
A. Principles
- The Transferability approach fits into the robot-in-the-loop simulation-based optimization approaches.
- As the Transferability approach aims at finding solutions both efficient in simulation and transferable from simulation to reality, it does not always find the optimal solutions in reality, but rather good compromises between efficiency in simulation and transferability.
- If the optimal solutions in reality indeed rely on unrealistic parts of the simulation, as illustrated on the Fig. 3, the approach will consequently avoid them, because they are not transferable.
- Based on 5 this observation, this case was never encountered in the two experiments studied in this paper.
B. From an exact STR disparity to a surrogate model
- D∗ links, for any possible controller c, the corresponding behavior in simulation b(c) in the behavior space B, to its exact STR disparity value D∗(b(c)).
- The authors rely on a so-called surrogate model to approximate the STR disparity function during the optimization process.
- 6 Surrogate models [23], [31], [57] are usually resorted to in real engineering problems when evaluating an individual on the target system means very high computation costs or too long experiments.
- Such interpolation methods rely on a distance function to compare solutions: the value predicted for a given solution mostly depends on the exact values of solutions that are close to it.
C. Optimization scheme
- Evaluation objectives: Each controller is evaluated by three objectives: 1. the task-dependent fitness, to find good controllers; 2. the corresponding approximated STR disparity com- puted with the surrogate model, to find transferable controllers; 3. the behavioral diversity objective.
- This last objective allows to maintain behavioral diversity among the population, which efficiently enhances exploration of the controller state space [15], [42].
- In such a context, the update heuristic defined earlier boils down to randomly selecting one individual among those whose diversity value is higher than the diversity threshold τdiv .
- It ensures that any new experiment selected by the update heuristic is meaningful.
D. Algorithm outline
- To initialize the surrogate model of the STR disparity, the authors assume that a controller c0 has already been transferred onto the real system at the beginning of each optimization process.
- The corresponding exact STR disparity value D∗(c0) and the behavioral features in simulation are computed.
- In order to transfer different enough behaviors from those corresponding to the already transferred controllers and then to limit the number of experiments, the update heuristic relies on a diversity threshold τdiv: one controller, randomly selected among those in the current population whose behavioral diversity value is greater than τdiv , is transferred.
- The diversity threshold is designed by hand to achieve a given number of transfer experiments on average during the whole optimization process.
- It is next used to update the surrogate model of the STR disparity function.
E. Best solution of a run
- The authors assume at first that a threshold D∗threshold on the STR disparity values can be empirically chosen in such a way 8 that STR disparity values greater than D∗threshold empirically means bad transfers.
- For the class A1, the goal of the optimization process boils down to find an optimal individual in reality, that is a controller which solves the task.
- The same criteria are used for applications of the class A2.
- There are two possible cases: if the transferable non-dominated set is empty, the best solution of the run is the solution with the lowest STR disparity in the non-dominated set, although it should not transfer well; otherwise, the authors have to choose a best compromise solution in this transferable set.
- The authors approach has been validated with an e-puck robot on one of Jakobi’s early experiments on the reality gap problem (class A1, [26]).
A. Experimental set-up
- The authors first application aims to reproduce one of Jakobi’s experiments on the reality gap problem [26], [27], notably to compare their approach with Jakobi’s one that nowadays remains the most formalized methodology dedicated to this problem.
- Nevertheless, the light sensors of the e-puck robot appear not to be reliable enough and their experimental set-up is slightly different from the original one.
- Detects this wall in simulation, its sensors have to be noised so that the optimal behaviors found in simulation cannot exploit it and fail when transferred onto the real device (see Fig. 9).
- The genotype encodes 7 parameters for each of the 5 “left” neurons: 1) its threshold value (16 values regularly spaced from -1 to 1); 2) the destination neuron of its 3 possible outgoing connections (integer values from 1 to 16)6; 3) the 3 weights corresponding to these 3 connections (16 values regularly spaced from -2 to 2).
- The fitness values are averaged on both test cases to compute the global fitness value of an individual.
B. Problems encountered when implementing Jakobi’s approach
- In order to obtain controllers that transfer well from simulation to reality, Jakobi argues that if the authors look for robust enough individuals in simulation, they should transfer well onto the real device and also be robust in reality.
- Here, the authors only consider the reality gap problem and the concerns on robustness in reality are not especially evaluated.
- The infrared sensor values can indeed dramatically deviate from an experiment to another, as well as the duration of the color pattern detection by the camera.
- Preliminary experiments with such optimization schemes did not lead to individuals with high fitness or high robustness in simulation with the budget of evaluations fixed for the set-up.
- 7The maximal wheel speed in Jakobi’s original setup was 8 cm per second and the simulation was updated 10 times per second.
C. Approaches
- 1) Noise-based approach inspired from Jakobi’s one:.
- The individuals are optimized in the simulation with the real parameter values.
- Moreover, in order to transfer as few individuals as possible, only individuals that are optimal in simulation are transferred onto the e-puck robot during the optimization process.
- The diversity objective is computed as the minimal Hamming distance based on the binary genotype8 to the already transferred controllers.
- The best solution of a run is the transferred controller with the highest fitness value.
D. Results
- Table I sums up the location of the evaluation step (simulation, reality or both) along with the number of experiments done on the physical robot by run for each approach.
- All the approaches have been implemented using the stateof-the-art MOEA NSGA-II [14], based on non-dominated sorting and elitist tournament selection.
- The two reality-based optimization approaches, evolution on the physical robot and surrogate modelling of the real fitness, achieve clearly worse results, with respective average fitness values of 469 mm (sd = 45 mm) and 466 mm (sd = 77 mm).
- Concerning the noise-based approach, the original results obtained in [26] are not reproduced: only 3 runs out of 10 lead to optimal solutions in reality, while the method always worked in the original set-up.
- A typical behavior obtained in reality with the Transferability approach is pictured on Fig. 12.
V. APPLICATION II: QUADRUPEDAL WALKING ROBOT
- Locomotion problems have often been addressed in Evolutionary Robotics.
- In particular, quadrupedal walking offers the advantage of various kinds of gaits: from static and easy to model walks to more dynamic and complex ones.
- As these gaits do not need the same level of accuracy to be correctly modeled in simulation, they are expected to achieve different transferability performances on the real device.
- The fitness is the distance covered by the robot during a fixed time.
- Contrary to the previous application, the optimality of a given solution cannot be directly derived from 14 the corresponding behavior whether in simulation or on the real robot (class A2 on the Fig. 6), as the maximal robot speed is unknown.
A. Robot and experimental set-up
- The physical robot is made from a Bioloid Kit and has been built after the wheeled-legged robot Hylos [20] designed for autonomous planetary/volcanic exploration.
- Each leg then includes an upper leg motor and a lower leg motor, all controlled in position.
- The authors also use a simulator relying on the Bullet Physics Library, an open source physics engine [5].
- For their application, the following points have been carefully modeled: dimensions of the robot, masses of the different parts, mass asymmetry of the main body, contact areas of the wheels, servos’ built-in controller (according to the Dynamixel documentation).
- The fitness landscape in simulation is complex as shown on Figure 14.
B. Approaches
- The exact STR disparity measure is based on the real and simulated distances from the origin √ x2 + y2 of the robot’s geometric center that are computed respectively from the recorded real and simulated trajectories for each sampled data point.
- To build the surrogate model, a given number of selected controllers have to be transferred onto the physical robot to record the corresponding fitness values in reality.
- 16 The diversity objective is computed as the minimal Euclidean distance based on the genotype to the already transferred controllers.
- The best solution of a run is the transferred controller with the highest fitness value.
- As the surrogate model builds the relation between the two control parameters and the covered distance in reality, five preliminary experiments are needed to initialize the Kriging model.
C. Results
- The Table III sums up the location of the evaluation step (simulation, reality or both) along with the number of experiments done on the physical robot by run for each approach.
- The Control approach leads to better results, possibly because it does not always find the true optimal solutions in simulation, which is quite in agreement with the antagonism the authors hypothesize between efficiency and transferability.
- In order to study more in details the reality gap problem for this application, the best individuals found with the Control approach +.
- Such results show that the real fitness landscape of this application is likely to be simple 16We select the same diversity threshold τdiv = 0.1 as with the Transferability approach, which leads to archives CS of 11 individuals on average (sd = 1).the authors.the authors.
- The best trade-off individual among the best solutions obtained with the Transferability approach achieves 1132 mm in the simulation and 1099 mm on the real robot with a 0.004 STR disparity value17.
VI. FURTHER INVESTIGATIONS
- In place of transfers from simulation to reality, the authors solve a fictive reality gap problem between a simplified simulator and the accurate simulator used in the previous section.
- They only differ from each other by the modeling of the servos’ built-in controller.
- The simple simulator relies on a proportional relation between the speed and the position error, while the accurate one is based on the Dynamixel documentation.
A. Concerning the surrogate model
- The graph 19 shows for all the individuals on the last non-dominated sets of each run the corresponding approximated STR disparity values and the corresponding exact STR disparity values.
- It is highly linked to the main drawback of the Inverse Distance Weighting interpolation technique: the predicted value always lies between the minimum and the maximum of the interpolated data points.
- The Pearson’s correlation coefficients between the approximated STR disparity and the exact one are relatively high, with an average of 0.76 (sd = 0.11).
- It indicates that there is a strong positive monotonic relation between the approximation and the exact function.
- Such considerations often are sufficient to conclude that the surrogate model is of good quality [25]: the surrogate model seems to provide the evolutionary search with a good gradient.
B. Concerning the behavioral distances
- In the Transferability approach, two behavioral distances are used: (1) the transferability measure that compares simulated behaviors with real ones; (2) the in silico metric that only compares behaviors in simulation.
- The bFeat+DTraj variant corresponds to the original approach.
- In the application with the quadrupedal robot, the three behavioral features separate behaviors that are efficient or not (covered distance), that make the robot overturns or do not (mean height), that are more or less stable (final orientation).
- Such results validate their original approach with a featurebased behavioral distance as in silico metric and a trajectorybased behavioral distance as transferability measure.
C. Concerning the update heuristic and the diversity objective
- The authors now implement a second set of variants (table V) depending on: (1) the update heuristic used to choose the transfer experiments; (2) the presence or lack of the diversity objective.
- The “random” update heuristic consists in choosing at random the controller to transfer among those whose diversity value is greater than the diversity threshold τdiv as used in the original approach.
- All the results are shown on the Fig. 21.
- The variants RandomT & Div and MaxDivT & Div behave well, but the RandomT & Div variant shows best results, as it looks for better tradeoff solutions.
- It means, counter-intuitively, that transferring the most different controller from the already transferred ones is not ideal.
A. Antagonism between efficiency and transferability
- In their approach, controllers are evaluated in simulation by a task-dependent fitness and a STR disparity value.
- This antagonism has to be discussed according to the results obtained in both set-ups.
- Diversity on the application with the quadrupedal walking robot.
- The former approach behaves clearly better in reality than the latter one.
- Using a soft constraint based on the STR disparity value could provide an alternative to multiobjective optimization.
B. Towards an on-board transferability measure
- In both applications, the transferability measure relies on external information: the trajectory of the robot recorded with CODA cx1 scanners.
- Nevertheless, it is only meaningful if the sensor values are accurately modeled in simulation, sometimes despite significant amounts of noise.
- It is also argued in [7] that accurate quantitative comparisons between two sensor time series is difficult because small initial disparities can quickly lead to very different signals, which can lead to prefer external measures for physical robots [8].
- Another promising way is to exploit sensorimotor informations to obtain an accurate estimation of the trajectory by sensor integration [4], [9].
- It is sometimes pertinent to combine different types of sensors (short-range distance sensors with a camera for instance) by periodic repositioning of the estimation.
C. Modeling the fitness or the transferability
- The use of surrogate models is increasing in robotics, most of the time to directly approximate the fitness function on the physical robot.
- Such approximations try to map the relation between the parameters of the controller and the fitness in reality by interpolating a global function from few experimental data with Kriging-like methods.
- Another issue concerns the number of parameters introduced by Kriging methods.
- For realistic landscapes, several transfer experiments will be needed for exploring and building a sufficiently accurate surrogate model to avoid local maxima without any prior knowledge.
- In fact, selecting one of these two approaches comes down to the availability of relevant simulation models and, in practice, simulation models are often available for robotic applications.
D. Upgrading the simulation from the STR disparity measure
- At the end of a run performed with the Transferability approach, the obtained surrogate model of the STR disparity function gives a rough landscape of which parts of the simulation are not well-modeled and which parts are realistic.
- It then is possible to use clustering methods to notably extract which kinds of behaviors are more or less linked to bad transferability values.
- Depending on the complexity of the problem, the next step, which consists to understand how the simulation model makes these behaviors non-transferable and finally to improve the model, must be conducted by interacting with experts in robotics and mechanics.
VIII. CONCLUSION AND FURTHER WORK
- This paper addressed the reality gap problem in the case of controller optimization, a critical issue in Evolutionary Robotics, which often happens when resorting to simulators.
- Controllers are evaluated by 3 main objectives in a multi-objective manner: a task-dependent fitness and a simulation-to-reality disparity that estimates controller’s transferability using a surrogate model.
- Better results were achieved by the Transferability approach regarding both exact STR disparity and covered distance in reality with very few transfer experiments during the optimization.
- A second application to an 8-DOF quadrupedal walking robot has also been investigated and their approach again finds controllers that are relevant regarding a walking task and that transfer well to reality.
- Each simulation model is a compromise between accuracy and speed.
Did you find this useful? Give us your feedback
Citations
2,079 citations
Additional excerpts
...[22]....
[...]
966 citations
928 citations
317 citations
Cites background from "The Transferability Approach: Cross..."
...in robotics), one can then use any of the many available techniques for transferring the robust policy from simulation to the real world [59, 60, 81]....
[...]
268 citations
References
37,111 citations
"The Transferability Approach: Cross..." refers methods in this paper
...As for the previous application, all the approaches have been implemented using the MOEA NSGA-II [14]14....
[...]
...All the approaches have been implemented using the stateof-the-art MOEA NSGA-II [14], based on non-dominated sorting and elitist tournament selection.10 For single-objective optimization schemes (Control approaches), NSGA-II is equivalent to an elitist tournament-based EA....
[...]
...As for the previous application, all the approaches have been implemented using the MOEA NSGA-II [14](14)....
[...]
...All the approaches have been implemented using the stateof-the-art MOEA NSGA-II [14], based on non-dominated sorting and elitist tournament selection....
[...]
6,914 citations
"The Transferability Approach: Cross..." refers methods in this paper
...Some of the most used interpolation methods are: 1) Radial Basis Function [30]; 2) Inverse Distance Weighting model [54]; 3) Kriging model [32]....
[...]
...Kriging is a group of popular interpolation geostatistical methods [23], [32], somewhat similar to IDW interpolation....
[...]
3,882 citations
1,228 citations
"The Transferability Approach: Cross..." refers background or methods in this paper
...Consequently, we do not build a simulation model from scratch nor modify it, but we rather exploit an already available simulator where it mimics the reality at most....
[...]
...…if the surrogate model is accurate compared to the exact STR disparity measure, we conducted 10 runs of the Transferability approach as described in the section III with a diversity threshold τdiv = 0.05, that corresponds to 25 transfers by run on average from the simple simulator to the…...
[...]
...Our first application aims to reproduce one of Jakobi’s experiments on the reality gap problem [26], [27], notably to compare our approach with Jakobi’s one that nowadays remains the most formalized methodology dedicated to this problem....
[...]
...The approach then looks for the most efficient controllers whose behaviors are sufficiently based on the realistic parts of the simulation model to transfer well onto the real robot....
[...]
927 citations
"The Transferability Approach: Cross..." refers methods in this paper
...The implementation and the use of the Kriging model rely on the DACE Matlab toolbox [36]....
[...]
Related Papers (5)
Frequently Asked Questions (11)
Q2. How many preliminary experiments are needed to initialize the Kriging model?
As the surrogate model builds the relation between the two control parameters and the covered distance in reality, five preliminary experiments are needed to initialize the Kriging model.
Q3. What is the way to optimize controllers with expensive fitness functions?
A classic way to optimize controllers with expensive fitness functions comes down to directly build a surrogate model of the fitness in reality [31], instead of relying on a simulation model: the surrogate model tries to approximate the relation between the control parameters and the real fitness.
Q4. How many experiments can be performed to initialize the Kriging model?
For the T-maze problem, the surrogate model can hardly rely on Kriging interpolation with their budget of evaluations on the robot: the controller depends on 35 parameters, which implies at least 71 experiments to initialize the Kriging model.
Q5. How many transfer experiments are needed for realistic landscapes?
For realistic landscapes, several transfer experiments will be needed for exploring and building a sufficiently accurate surrogate model to avoid local maxima without any prior knowledge.
Q6. What is the main reason why the robot-in-the-loop optimization approaches rely?
The robot-in-the-loop simulation-based optimization approaches also rely mostly on simulators but some transfer experiments are allowed during the optimization.
Q7. How many experiments have been repeated to have the same amount of fitness in reality?
Because this approach relies on much more experiments in reality than the other approaches, it has only been repeated 3 times to have the same amount of experiments in reality in total (about 60 experiments for each approach).
Q8. What is the reason why the Control approach leads to better results?
the Control approach leads to better results, possibly because it does not always find the true optimal solutions in simulation, which is quite in agreement with the antagonism the authors hypothesize between efficiency and transferability.
Q9. What is the main argument for the argument that external measures are not compatible with bigger robots?
One could argue that such external measures require heavy/costly experimental set-up which are hardly compatible with bigger robots and that on-board sensorimotor informations should be preferred to compare simulated and real behaviors.
Q10. How many evaluations did the optimization take?
The optimization took more than 60 hours with about 8000 evaluations on the physical robot, while the task seems relatively simple.
Q11. How can a robot be upgraded to a real device?
These simulators allow to speed up the evaluation of the controllers, while being upgraded by conducting some meaningful transfer experiments on the real device.