MuJoCo: A physics engine for model-based control

doi:10.1109/IROS.2012.6386109

Home
/
Papers
/
MuJoCo: A physics engine for model-based control

Proceedings Article•DOI•

MuJoCo: A physics engine for model-based control

Emanuel Todorov¹, Tom Erez¹, Yuval Tassa¹•Institutions (1)

University of Washington¹

24 Dec 2012-pp 5026-5033

TL;DR: A new physics engine tailored to model-based control, based on the modern velocity-stepping approach which avoids the difficulties with spring-dampers, which can compute both forward and inverse dynamics.

read less

Abstract: We describe a new physics engine tailored to model-based control. Multi-joint dynamics are represented in generalized coordinates and computed via recursive algorithms. Contact responses are computed via efficient new algorithms we have developed, based on the modern velocity-stepping approach which avoids the difficulties with spring-dampers. Models are specified using either a high-level C++ API or an intuitive XML file format. A built-in compiler transforms the user model into an optimized data structure used for runtime computation. The engine can compute both forward and inverse dynamics. The latter are well-defined even in the presence of contacts and equality constraints. The model can include tendon wrapping as well as actuator activation states (e.g. pneumatic cylinders or muscles). To facilitate optimal control applications and in particular sampling and finite differencing, the dynamics can be evaluated for different states and controls in parallel. Around 400,000 dynamics evaluations per second are possible on a 12-core machine, for a 3D homanoid with 18 dofs and 6 active contacts. We have already used the engine in a number of control applications. It will soon be made publicly available.

...read moreread less

Content maybe subject to copyright Report

Citations

PDF

Open Access

More filters

Posted Content•

Proximal Policy Optimization Algorithms

[...]

John Schulman, Filip Wolski, Prafulla Dhariwal, Alec Radford, Oleg Klimov - Show less +1 more

20 Jul 2017-arXiv: Learning

TL;DR: A new family of policy gradient methods for reinforcement learning, which alternate between sampling data through interaction with the environment, and optimizing a "surrogate" objective function using stochastic gradient ascent, are proposed.

...read moreread less

Abstract: We propose a new family of policy gradient methods for reinforcement learning, which alternate between sampling data through interaction with the environment, and optimizing a "surrogate" objective function using stochastic gradient ascent. Whereas standard policy gradient methods perform one gradient update per data sample, we propose a novel objective function that enables multiple epochs of minibatch updates. The new methods, which we call proximal policy optimization (PPO), have some of the benefits of trust region policy optimization (TRPO), but they are much simpler to implement, more general, and have better sample complexity (empirically). Our experiments test PPO on a collection of benchmark tasks, including simulated robotic locomotion and Atari game playing, and we show that PPO outperforms other online policy gradient methods, and overall strikes a favorable balance between sample complexity, simplicity, and wall-time.

...read moreread less

9,020 citations

Cites methods from "MuJoCo: A physics engine for model-..."

...for each algorithm variant, we chose a computationally cheap benchmark to test the algorithms on. Namely, we used 7 simulated robotics tasks 2 implemented in OpenAI Gym [Bro+16], which use the MuJoCo [TET12] physics engine. We do one million timesteps of training on each one. Besides the hyperparameters used for clipping ( ) and the KL penalty ( ;d targ ), which we search over, the other hyperparameters...
[...]

Proceedings Article•

Model-agnostic meta-learning for fast adaptation of deep networks

[...]

Chelsea Finn¹, Pieter Abbeel¹, Sergey Levine¹•Institutions (1)

University of California, Berkeley¹

06 Aug 2017

TL;DR: An algorithm for meta-learning that is model-agnostic, in the sense that it is compatible with any model trained with gradient descent and applicable to a variety of different learning problems, including classification, regression, and reinforcement learning is proposed.

...read moreread less

Abstract: We propose an algorithm for meta-learning that is model-agnostic, in the sense that it is compatible with any model trained with gradient descent and applicable to a variety of different learning problems, including classification, regression, and reinforcement learning. The goal of meta-learning is to train a model on a variety of learning tasks, such that it can solve new learning tasks using only a small number of training samples. In our approach, the parameters of the model are explicitly trained such that a small number of gradient steps with a small amount of training data from a new task will produce good generalization performance on that task. In effect, our method trains the model to be easy to fine-tune. We demonstrate that this approach leads to state-of-the-art performance on two few-shot image classification benchmarks, produces good results on few-shot regression, and accelerates fine-tuning for policy gradient reinforcement learning with neural network policies.

...read moreread less

7,027 citations

Cites background or methods from "MuJoCo: A physics engine for model-..."

...complex deep RL problems, we also study adaptation on high-dimensional locomotion tasks with the MuJoCo simulator (Todorov et al., 2012)....
[...]
...To study how well MAML can scale to more complex deep RL problems, we also study adaptation on high-dimensional locomotion tasks with the MuJoCo simulator (Todorov et al., 2012)....
[...]

Posted Content•

Continuous control with deep reinforcement learning

[...]

Timothy P. Lillicrap¹, Jonathan J. Hunt¹, Alexander Pritzel¹, Nicolas Heess¹, Tom Erez¹, Yuval Tassa¹, David Silver¹, Daan Wierstra¹ - Show less +4 more•Institutions (1)

Google¹

09 Sep 2015-arXiv: Learning

TL;DR: This work presents an actor-critic, model-free algorithm based on the deterministic policy gradient that can operate over continuous action spaces, and demonstrates that for many of the tasks the algorithm can learn policies end-to-end: directly from raw pixel inputs.

...read moreread less

Abstract: We adapt the ideas underlying the success of Deep Q-Learning to the continuous action domain. We present an actor-critic, model-free algorithm based on the deterministic policy gradient that can operate over continuous action spaces. Using the same learning algorithm, network architecture and hyper-parameters, our algorithm robustly solves more than 20 simulated physics tasks, including classic problems such as cartpole swing-up, dexterous manipulation, legged locomotion and car driving. Our algorithm is able to find policies whose performance is competitive with those found by a planning algorithm with full access to the dynamics of the domain and its derivatives. We further demonstrate that for many of the tasks the algorithm can learn policies end-to-end: directly from raw pixel inputs.

...read moreread less

4,225 citations

Additional excerpts

...These environments were simulated using MuJoCo [13]....
[...]

Proceedings Article•

Trust Region Policy Optimization

[...]

John Schulman¹, Sergey Levine¹, Pieter Abbeel¹, Michael I. Jordan¹, Philipp Moritz¹ - Show less +1 more•Institutions (1)

University of California, Berkeley¹

06 Jul 2015

TL;DR: A method for optimizing control policies, with guaranteed monotonic improvement, by making several approximations to the theoretically-justified scheme, called Trust Region Policy Optimization (TRPO).

...read moreread less

Abstract: In this article, we describe a method for optimizing control policies, with guaranteed monotonic improvement. By making several approximations to the theoretically-justified scheme, we develop a practical algorithm, called Trust Region Policy Optimization (TRPO). This algorithm is effective for optimizing large nonlinear policies such as neural networks. Our experiments demonstrate its robust performance on a wide variety of tasks: learning simulated robotic swimming, hopping, and walking gaits; and playing Atari games using images of the screen as input. Despite its approximations that deviate from the theory, TRPO tends to give monotonic improvement, with little tuning of hyperparameters.

...read moreread less

3,479 citations

Cites methods from "MuJoCo: A physics engine for model-..."

...We conducted the robotic locomotion experiments using the MuJoCo simulator (Todorov et al., 2012)....
[...]
...games from images using convolutional neural networks with tens of thousands of parameters. 8.1Simulated Robotic Locomotion We conducted the robotic locomotion experiments using the MuJoCo simulator (Todorov et al., 2012). The three simulated robots are shown in Figure 2. The states of the robots are their generalized positions and velocities, and the controls are joint torques. Underactuation, high dimensionality, an...
[...]

Posted Content•

Trust Region Policy Optimization

[...]

John Schulman, Sergey Levine, Philipp Moritz, Michael I. Jordan, Pieter Abbeel - Show less +1 more

19 Feb 2015-arXiv: Learning

TL;DR: Trust Region Policy Optimization (TRPO) as mentioned in this paper is an iterative procedure for optimizing policies, with guaranteed monotonic improvement, which is similar to natural policy gradient methods and is effective for optimizing large nonlinear policies such as neural networks.

...read moreread less

Abstract: We describe an iterative procedure for optimizing policies, with guaranteed monotonic improvement. By making several approximations to the theoretically-justified procedure, we develop a practical algorithm, called Trust Region Policy Optimization (TRPO). This algorithm is similar to natural policy gradient methods and is effective for optimizing large nonlinear policies such as neural networks. Our experiments demonstrate its robust performance on a wide variety of tasks: learning simulated robotic swimming, hopping, and walking gaits; and playing Atari games using images of the screen as input. Despite its approximations that deviate from the theory, TRPO tends to give monotonic improvement, with little tuning of hyperparameters.

...read moreread less

3,171 citations

1
2
3
4
…
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200

Collapse

References

PDF

Open Access

More filters

Proceedings Article•DOI•

Evolving virtual creatures

[...]

Karl Sims

24 Jul 1994

TL;DR: A genetic language is presented that uses nodes and connections as its primitive elements to represent directed graphs, which are used to describe both the morphology and the neural circuitry of creatures that move and behave in simulated three-dimensional physical worlds.

...read moreread less

Abstract: This paper describes a novel system for creating virtual creatures that move and behave in simulated three-dimensional physical worlds. The morphologies of creatures and the neural systems for controlling their muscle forces are both generated automatically using genetic algorithms. Different fitness evaluation functions are used to direct simulated evolutions towards specific behaviors such as swimming, walking, jumping, and following.A genetic language is presented that uses nodes and connections as its primitive elements to represent directed graphs, which are used to describe both the morphology and the neural circuitry of these creatures. This genetic language defines a hyperspace containing an indefinite number of possible creatures with behaviors, and when it is searched using optimization techniques, a variety of successful and interesting locomotion strategies emerge, some of which would be difficult to invent or built by design.

...read moreread less

1,127 citations

"MuJoCo: A physics engine for model-..." refers background in this paper

...In the context of control optimization, however, the controller is being "tuned" to the engine and not the other way around....
[...]

Book•

Rigid Body Dynamics Algorithms

[...]

Roy Featherstone

26 Nov 2007

TL;DR: Rigid Body Dynamics Algorithms presents the subject of computational rigid-body dynamics through the medium of spatial 6D vector notation to facilitate the implementation of dynamics algorithms on a computer: shorter, simpler code that is easier to write, understand and debug, with no loss of efficiency.

...read moreread less

Abstract: Rigid Body Dynamics Algorithms presents the subject of computational rigid-body dynamics through the medium of spatial 6D vector notation. It explains how to model a rigid-body system and how to analyze it, and it presents the most comprehensive collection of the best rigid-bodydynamics algorithms to be found in a single source. The use of spatial vector notation greatly reduces the volume of algebra which allows systems to be described using fewer equations and fewer quantities. It also allows problems to be solved in fewer steps, and solutions to be expressed more succinctly. In addition algorithms are explained simply and clearly, and are expressed in a compact form. The use of spatial vector notation facilitates the implementation of dynamics algorithms on a computer: shorter, simpler code that is easier to write, understand and debug, with no loss of efficiency.

...read moreread less

1,057 citations

"MuJoCo: A physics engine for model-..." refers background in this paper

...Note that contact simulation is an area of active research, unlike simulation of smooth multi-joint dynamics where the book has basically been written [4]....
[...]
...A regularization term can again be included....
[...]

Proceedings Article•DOI•

Synthesis and stabilization of complex behaviors through online trajectory optimization

[...]

Yuval Tassa¹, Tom Erez¹, Emanuel Todorov¹•Institutions (1)

University of Washington¹

24 Dec 2012

TL;DR: An online trajectory optimization method and software platform applicable to complex humanoid robots performing challenging tasks such as getting up from an arbitrary pose on the ground and recovering from large disturbances using dexterous acrobatic maneuvers is presented.

...read moreread less

Abstract: We present an online trajectory optimization method and software platform applicable to complex humanoid robots performing challenging tasks such as getting up from an arbitrary pose on the ground and recovering from large disturbances using dexterous acrobatic maneuvers. The resulting behaviors, illustrated in the attached video, are computed only 7 × slower than real time, on a standard PC. The video also shows results on the acrobot problem, planar swimming and one-legged hopping. These simpler problems can already be solved in real time, without pre-computing anything.

...read moreread less

778 citations

"MuJoCo: A physics engine for model-..." refers background in this paper

...Non-convex meshes can be rendered but are not used in collision detection; instead the user should decompose them into convex meshes. e) Site: Sites are points of interest (along with 3D frames) defined in the local frames of the bodies, and thus moving with the bodies....
[...]

Journal Article•DOI•

An implicit time-stepping scheme for rigid body dynamics with inelastic collisions and coulomb friction

[...]

David E. Stewart¹, Jeff Trinkle¹•Institutions (1)

Texas A&M University¹

15 Aug 1996-International Journal for Numerical Methods in Engineering

TL;DR: In this paper, a new time-stepping method for simulating systems of rigid bodies is given which incorporates Coulomb friction and inelastic impacts and shocks, which does not need to identify explicitly impulsive forces.

...read moreread less

Abstract: In this paper a new time-stepping method for simulating systems of rigid bodies is given which incorporates Coulomb friction and inelastic impacts and shocks. Unlike other methods which take an instantaneous point of view, this method does not need to identify explicitly impulsive forces. Instead, the treatment is similar to that of J. J. Moreau and Monteiro-Marques, except that the numerical formulation used here ensures that there is no inter-penetration of rigid bodies, unlike their velocity-based formulation. Numerical results are given for the method presented here for a spinning rod impacting a table in two dimensions, and a system of four balls colliding on a table in a fully three-dimensional way. These numerical results also show the practicality of the method, and convergence of the method as the step size becomes small.

...read moreread less

644 citations

"MuJoCo: A physics engine for model-..." refers background or methods in this paper

...Second, the branch-induced sparsity of makes sparse factorization a lot faster than ¡ 3 ¢ as shown in [4]....
[...]
...Another issue with game engines lies in the contact dynamics, formulated as (approximations to) linear complementarity problems or LCPs [8]....
[...]
...This is done by computing the components of f independently for each contact (the diagonal solver ignores contact interactions by definition) and enforcing the friction-cone constraints, with the same projection method as above....
[...]
...G. Inverse dynamics We now describe the computation of inverse dynamics, which is a unique feature of MuJoCo....
[...]

Journal Article•DOI•

Staggered projections for frictional contact in multibody systems

[...]

Danny M. Kaufman¹, Shinjiro Sueda¹, Doug L. James², Dinesh K. Pai¹•Institutions (2)

University of British Columbia¹, Cornell University²

01 Dec 2008

TL;DR: A new discrete velocity-level formulation of frictional contact dynamics that reduces to a pair of coupled projections and introduces a simple fixed-point property of this coupled system allows a novel algorithm for accurate frictional Contact Resolution based on a simple staggered sequence of projections to be constructed.

...read moreread less

Abstract: We present a new discrete velocity-level formulation of frictional contact dynamics that reduces to a pair of coupled projections and introduce a simple fixed-point property of this coupled system This allows us to construct a novel algorithm for accurate frictional contact resolution based on a simple staggered sequence of projections The algorithm accelerates performance using warm starts to leverage the potentially high temporal coherence between contact states and provides users with direct control over frictional accuracy Applying this algorithm to rigid and deformable systems, we obtain robust and accurate simulations of frictional contact behavior not previously possible, at rates suitable for interactive haptic simulations, as well as large-scale animations By construction, the proposed algorithm guarantees exact, velocity-level contact constraint enforcement and obtains long-term stable and robust integration Examples are given to illustrate the performance, plausibility and accuracy of the obtained solutions

...read moreread less

193 citations