scispace - formally typeset
Open AccessJournal ArticleDOI

Reinforcement Learning for Improving Agent Design.

David Ha
- 20 Nov 2019 - 
- Vol. 25, Iss: 4, pp 352-365
Reads0
Chats0
TLDR
In many reinforcement learning tasks, the goal is to learn a policy to manipulate an agent, whose design is fixed, to maximize some notion of cumulative reward as mentioned in this paper, where the design of the agent's physical s...
Abstract
In many reinforcement learning tasks, the goal is to learn a policy to manipulate an agent, whose design is fixed, to maximize some notion of cumulative reward. The design of the agent's physical s...

read more

Content maybe subject to copyright    Report

Citations
More filters
Posted Content

Paired Open-Ended Trailblazer (POET): Endlessly Generating Increasingly Complex and Diverse Learning Environments and Their Solutions

TL;DR: The Paired Open-Ended Trailblazer (POET) algorithm is introduced, which pairs the generation of environmental challenges and the optimization of agents to solve those challenges and allows these stepping-stone solutions to transfer between problems if better, catalyzing innovation.
Proceedings Article

Weight Agnostic Neural Networks

TL;DR: In this paper, the authors propose a search method for neural network architectures that can already perform a task without any explicit weight training. But how important are the weight parameters of a neural network compared to its architecture, they question to what extent neural network architecture alone, without learning any weight parameters, can encode solutions for a given task.
Posted Content

AI-GAs: AI-generating algorithms, an alternate paradigm for producing general artificial intelligence

TL;DR: It is argued that the pursuit of AI-GAs should be considered a new grand challenge of computer science research and the ML community should increase its research investment in the AI-GA approach.

Teacher algorithms for curriculum learning of Deep RL in continuously parameterized environments

TL;DR: TeachDeepRL as mentioned in this paper considers the problem of how a teacher algorithm can enable an unknown Deep Reinforcement Learning (DRL) student to become good at a skill over a wide range of diverse environments.
Journal ArticleDOI

Shape Changing Robots: Bioinspiration, Simulation, and Physical Realization.

TL;DR: An overview of the literature related to robots that change shape to enhance and expand their functionality is presented and related grand challenges, including shape sensing, finding, and changing, which rely on innovations in multifunctional materials, distributed actuation and sensing, and somatic control to enable next-generation shape changing robots are discussed.
References
More filters
Journal ArticleDOI

Long short-term memory

TL;DR: A novel, efficient, gradient based method called long short-term memory (LSTM) is introduced, which can learn to bridge minimal time lags in excess of 1000 discrete-time steps by enforcing constant error flow through constant error carousels within special units.
Posted Content

Proximal Policy Optimization Algorithms

TL;DR: A new family of policy gradient methods for reinforcement learning, which alternate between sampling data through interaction with the environment, and optimizing a "surrogate" objective function using stochastic gradient ascent, are proposed.
Journal ArticleDOI

Simple Statistical Gradient-Following Algorithms for Connectionist Reinforcement Learning

TL;DR: This article presents a general class of associative reinforcement learning algorithms for connectionist networks containing stochastic units that are shown to make weight adjustments in a direction that lies along the gradient of expected reinforcement in both immediate-reinforcement tasks and certain limited forms of delayed-reInforcement tasks, and they do this without explicitly computing gradient estimates.
Proceedings Article

Asynchronous methods for deep reinforcement learning

TL;DR: A conceptually simple and lightweight framework for deep reinforcement learning that uses asynchronous gradient descent for optimization of deep neural network controllers and shows that asynchronous actor-critic succeeds on a wide variety of continuous motor control problems as well as on a new task of navigating random 3D mazes using a visual input.
Proceedings ArticleDOI

MuJoCo: A physics engine for model-based control

TL;DR: A new physics engine tailored to model-based control, based on the modern velocity-stepping approach which avoids the difficulties with spring-dampers, which can compute both forward and inverse dynamics.