scispace - formally typeset
Search or ask a question
Proceedings ArticleDOI

MuJoCo: A physics engine for model-based control

24 Dec 2012-pp 5026-5033
TL;DR: A new physics engine tailored to model-based control, based on the modern velocity-stepping approach which avoids the difficulties with spring-dampers, which can compute both forward and inverse dynamics.
Abstract: We describe a new physics engine tailored to model-based control. Multi-joint dynamics are represented in generalized coordinates and computed via recursive algorithms. Contact responses are computed via efficient new algorithms we have developed, based on the modern velocity-stepping approach which avoids the difficulties with spring-dampers. Models are specified using either a high-level C++ API or an intuitive XML file format. A built-in compiler transforms the user model into an optimized data structure used for runtime computation. The engine can compute both forward and inverse dynamics. The latter are well-defined even in the presence of contacts and equality constraints. The model can include tendon wrapping as well as actuator activation states (e.g. pneumatic cylinders or muscles). To facilitate optimal control applications and in particular sampling and finite differencing, the dynamics can be evaluated for different states and controls in parallel. Around 400,000 dynamics evaluations per second are possible on a 12-core machine, for a 3D homanoid with 18 dofs and 6 active contacts. We have already used the engine in a number of control applications. It will soon be made publicly available.

Content maybe subject to copyright    Report

Citations
More filters
Posted Content
TL;DR: A new family of policy gradient methods for reinforcement learning, which alternate between sampling data through interaction with the environment, and optimizing a "surrogate" objective function using stochastic gradient ascent, are proposed.
Abstract: We propose a new family of policy gradient methods for reinforcement learning, which alternate between sampling data through interaction with the environment, and optimizing a "surrogate" objective function using stochastic gradient ascent. Whereas standard policy gradient methods perform one gradient update per data sample, we propose a novel objective function that enables multiple epochs of minibatch updates. The new methods, which we call proximal policy optimization (PPO), have some of the benefits of trust region policy optimization (TRPO), but they are much simpler to implement, more general, and have better sample complexity (empirically). Our experiments test PPO on a collection of benchmark tasks, including simulated robotic locomotion and Atari game playing, and we show that PPO outperforms other online policy gradient methods, and overall strikes a favorable balance between sample complexity, simplicity, and wall-time.

9,020 citations


Cites methods from "MuJoCo: A physics engine for model-..."

  • ...for each algorithm variant, we chose a computationally cheap benchmark to test the algorithms on. Namely, we used 7 simulated robotics tasks 2 implemented in OpenAI Gym [Bro+16], which use the MuJoCo [TET12] physics engine. We do one million timesteps of training on each one. Besides the hyperparameters used for clipping ( ) and the KL penalty ( ;d targ ), which we search over, the other hyperparameters...

    [...]

Proceedings Article
06 Aug 2017
TL;DR: An algorithm for meta-learning that is model-agnostic, in the sense that it is compatible with any model trained with gradient descent and applicable to a variety of different learning problems, including classification, regression, and reinforcement learning is proposed.
Abstract: We propose an algorithm for meta-learning that is model-agnostic, in the sense that it is compatible with any model trained with gradient descent and applicable to a variety of different learning problems, including classification, regression, and reinforcement learning. The goal of meta-learning is to train a model on a variety of learning tasks, such that it can solve new learning tasks using only a small number of training samples. In our approach, the parameters of the model are explicitly trained such that a small number of gradient steps with a small amount of training data from a new task will produce good generalization performance on that task. In effect, our method trains the model to be easy to fine-tune. We demonstrate that this approach leads to state-of-the-art performance on two few-shot image classification benchmarks, produces good results on few-shot regression, and accelerates fine-tuning for policy gradient reinforcement learning with neural network policies.

7,027 citations


Cites background or methods from "MuJoCo: A physics engine for model-..."

  • ...complex deep RL problems, we also study adaptation on high-dimensional locomotion tasks with the MuJoCo simulator (Todorov et al., 2012)....

    [...]

  • ...To study how well MAML can scale to more complex deep RL problems, we also study adaptation on high-dimensional locomotion tasks with the MuJoCo simulator (Todorov et al., 2012)....

    [...]

Posted Content
TL;DR: This work presents an actor-critic, model-free algorithm based on the deterministic policy gradient that can operate over continuous action spaces, and demonstrates that for many of the tasks the algorithm can learn policies end-to-end: directly from raw pixel inputs.
Abstract: We adapt the ideas underlying the success of Deep Q-Learning to the continuous action domain. We present an actor-critic, model-free algorithm based on the deterministic policy gradient that can operate over continuous action spaces. Using the same learning algorithm, network architecture and hyper-parameters, our algorithm robustly solves more than 20 simulated physics tasks, including classic problems such as cartpole swing-up, dexterous manipulation, legged locomotion and car driving. Our algorithm is able to find policies whose performance is competitive with those found by a planning algorithm with full access to the dynamics of the domain and its derivatives. We further demonstrate that for many of the tasks the algorithm can learn policies end-to-end: directly from raw pixel inputs.

4,225 citations


Additional excerpts

  • ...These environments were simulated using MuJoCo [13]....

    [...]

Proceedings Article
06 Jul 2015
TL;DR: A method for optimizing control policies, with guaranteed monotonic improvement, by making several approximations to the theoretically-justified scheme, called Trust Region Policy Optimization (TRPO).
Abstract: In this article, we describe a method for optimizing control policies, with guaranteed monotonic improvement. By making several approximations to the theoretically-justified scheme, we develop a practical algorithm, called Trust Region Policy Optimization (TRPO). This algorithm is effective for optimizing large nonlinear policies such as neural networks. Our experiments demonstrate its robust performance on a wide variety of tasks: learning simulated robotic swimming, hopping, and walking gaits; and playing Atari games using images of the screen as input. Despite its approximations that deviate from the theory, TRPO tends to give monotonic improvement, with little tuning of hyperparameters.

3,479 citations


Cites methods from "MuJoCo: A physics engine for model-..."

  • ...We conducted the robotic locomotion experiments using the MuJoCo simulator (Todorov et al., 2012)....

    [...]

  • ...games from images using convolutional neural networks with tens of thousands of parameters. 8.1Simulated Robotic Locomotion We conducted the robotic locomotion experiments using the MuJoCo simulator (Todorov et al., 2012). The three simulated robots are shown in Figure 2. The states of the robots are their generalized positions and velocities, and the controls are joint torques. Underactuation, high dimensionality, an...

    [...]

Posted Content
TL;DR: Trust Region Policy Optimization (TRPO) as mentioned in this paper is an iterative procedure for optimizing policies, with guaranteed monotonic improvement, which is similar to natural policy gradient methods and is effective for optimizing large nonlinear policies such as neural networks.
Abstract: We describe an iterative procedure for optimizing policies, with guaranteed monotonic improvement. By making several approximations to the theoretically-justified procedure, we develop a practical algorithm, called Trust Region Policy Optimization (TRPO). This algorithm is similar to natural policy gradient methods and is effective for optimizing large nonlinear policies such as neural networks. Our experiments demonstrate its robust performance on a wide variety of tasks: learning simulated robotic swimming, hopping, and walking gaits; and playing Atari games using images of the screen as input. Despite its approximations that deviate from the theory, TRPO tends to give monotonic improvement, with little tuning of hyperparameters.

3,171 citations

References
More filters
Proceedings ArticleDOI
24 Jul 1994
TL;DR: A genetic language is presented that uses nodes and connections as its primitive elements to represent directed graphs, which are used to describe both the morphology and the neural circuitry of creatures that move and behave in simulated three-dimensional physical worlds.
Abstract: This paper describes a novel system for creating virtual creatures that move and behave in simulated three-dimensional physical worlds. The morphologies of creatures and the neural systems for controlling their muscle forces are both generated automatically using genetic algorithms. Different fitness evaluation functions are used to direct simulated evolutions towards specific behaviors such as swimming, walking, jumping, and following.A genetic language is presented that uses nodes and connections as its primitive elements to represent directed graphs, which are used to describe both the morphology and the neural circuitry of these creatures. This genetic language defines a hyperspace containing an indefinite number of possible creatures with behaviors, and when it is searched using optimization techniques, a variety of successful and interesting locomotion strategies emerge, some of which would be difficult to invent or built by design.

1,127 citations


"MuJoCo: A physics engine for model-..." refers background in this paper

  • ...In the context of control optimization, however, the controller is being "tuned" to the engine and not the other way around....

    [...]

Book
26 Nov 2007
TL;DR: Rigid Body Dynamics Algorithms presents the subject of computational rigid-body dynamics through the medium of spatial 6D vector notation to facilitate the implementation of dynamics algorithms on a computer: shorter, simpler code that is easier to write, understand and debug, with no loss of efficiency.
Abstract: Rigid Body Dynamics Algorithms presents the subject of computational rigid-body dynamics through the medium of spatial 6D vector notation. It explains how to model a rigid-body system and how to analyze it, and it presents the most comprehensive collection of the best rigid-bodydynamics algorithms to be found in a single source. The use of spatial vector notation greatly reduces the volume of algebra which allows systems to be described using fewer equations and fewer quantities. It also allows problems to be solved in fewer steps, and solutions to be expressed more succinctly. In addition algorithms are explained simply and clearly, and are expressed in a compact form. The use of spatial vector notation facilitates the implementation of dynamics algorithms on a computer: shorter, simpler code that is easier to write, understand and debug, with no loss of efficiency.

1,057 citations


"MuJoCo: A physics engine for model-..." refers background in this paper

  • ...Note that contact simulation is an area of active research, unlike simulation of smooth multi-joint dynamics where the book has basically been written [4]....

    [...]

  • ...A regularization term can again be included....

    [...]

Proceedings ArticleDOI
24 Dec 2012
TL;DR: An online trajectory optimization method and software platform applicable to complex humanoid robots performing challenging tasks such as getting up from an arbitrary pose on the ground and recovering from large disturbances using dexterous acrobatic maneuvers is presented.
Abstract: We present an online trajectory optimization method and software platform applicable to complex humanoid robots performing challenging tasks such as getting up from an arbitrary pose on the ground and recovering from large disturbances using dexterous acrobatic maneuvers. The resulting behaviors, illustrated in the attached video, are computed only 7 × slower than real time, on a standard PC. The video also shows results on the acrobot problem, planar swimming and one-legged hopping. These simpler problems can already be solved in real time, without pre-computing anything.

778 citations


"MuJoCo: A physics engine for model-..." refers background in this paper

  • ...Non-convex meshes can be rendered but are not used in collision detection; instead the user should decompose them into convex meshes. e) Site: Sites are points of interest (along with 3D frames) defined in the local frames of the bodies, and thus moving with the bodies....

    [...]

Journal ArticleDOI
TL;DR: In this paper, a new time-stepping method for simulating systems of rigid bodies is given which incorporates Coulomb friction and inelastic impacts and shocks, which does not need to identify explicitly impulsive forces.
Abstract: In this paper a new time-stepping method for simulating systems of rigid bodies is given which incorporates Coulomb friction and inelastic impacts and shocks. Unlike other methods which take an instantaneous point of view, this method does not need to identify explicitly impulsive forces. Instead, the treatment is similar to that of J. J. Moreau and Monteiro-Marques, except that the numerical formulation used here ensures that there is no inter-penetration of rigid bodies, unlike their velocity-based formulation. Numerical results are given for the method presented here for a spinning rod impacting a table in two dimensions, and a system of four balls colliding on a table in a fully three-dimensional way. These numerical results also show the practicality of the method, and convergence of the method as the step size becomes small.

644 citations


"MuJoCo: A physics engine for model-..." refers background or methods in this paper

  • ...Second, the branch-induced sparsity of makes sparse factorization a lot faster than ¡ 3 ¢ as shown in [4]....

    [...]

  • ...Another issue with game engines lies in the contact dynamics, formulated as (approximations to) linear complementarity problems or LCPs [8]....

    [...]

  • ...This is done by computing the components of f independently for each contact (the diagonal solver ignores contact interactions by definition) and enforcing the friction-cone constraints, with the same projection method as above....

    [...]

  • ...G. Inverse dynamics We now describe the computation of inverse dynamics, which is a unique feature of MuJoCo....

    [...]

Journal ArticleDOI
01 Dec 2008
TL;DR: A new discrete velocity-level formulation of frictional contact dynamics that reduces to a pair of coupled projections and introduces a simple fixed-point property of this coupled system allows a novel algorithm for accurate frictional Contact Resolution based on a simple staggered sequence of projections to be constructed.
Abstract: We present a new discrete velocity-level formulation of frictional contact dynamics that reduces to a pair of coupled projections and introduce a simple fixed-point property of this coupled system This allows us to construct a novel algorithm for accurate frictional contact resolution based on a simple staggered sequence of projections The algorithm accelerates performance using warm starts to leverage the potentially high temporal coherence between contact states and provides users with direct control over frictional accuracy Applying this algorithm to rigid and deformable systems, we obtain robust and accurate simulations of frictional contact behavior not previously possible, at rates suitable for interactive haptic simulations, as well as large-scale animations By construction, the proposed algorithm guarantees exact, velocity-level contact constraint enforcement and obtains long-term stable and robust integration Examples are given to illustrate the performance, plausibility and accuracy of the obtained solutions

193 citations