scispace - formally typeset
Search or ask a question
Author

Gregory Kuhlmann

Bio: Gregory Kuhlmann is an academic researcher from University of Texas at Austin. The author has contributed to research in topics: General game playing & Simulations and games in economics education. The author has an hindex of 12, co-authored 16 publications receiving 1088 citations.

Papers
More filters
Journal ArticleDOI
TL;DR: The application of episodic SMDP Sarsa(λ) with linear tile-coding function approximation and variable λ to learning higher-level decisions in a keepaway subtask of RoboCup soccer results in agents that significantly outperform a range of benchmark policies.
Abstract: RoboCup simulated soccer presents many challenges to reinforcement learning methods, including a large state space, hidden and uncertain state, multiple independent agents learning simultaneously, and long and variable delays in the effects of actions. We describe our application of episodic SMDP Sarsa(λ) with linear tile-coding function approximation and variable λ to learning higher-level decisions in a keepaway subtask of RoboCup soccer. In keepaway, one team, “the keepers,” tries to keep control of the ball for as long as possible despite the efforts of “the takers.” The keepers learn individually when to hold the ball and when to pass to a teammate. Our agents learned policies that significantly outperform a range of benchmark policies. We demonstrate the generality of our approach by applying it to a number of task variations including different field sizes and different numbers of players on each team.

430 citations

Book ChapterDOI
01 Jan 2006
TL;DR: In this paper, a set of programs, tools, and resources designed to make the keepaway soccer domain easily usable for experimentation without any prior knowledge of RoboCup or the Soccer Server are presented.
Abstract: Keepaway soccer has been previously put forth as a testbed for machine learning. Although multiple researchers have used it successfully for machine learning experiments, doing so has required a good deal of domain expertise. This paper introduces a set of programs, tools, and resources designed to make the domain easily usable for experimentation without any prior knowledge of RoboCup or the Soccer Server. In addition, we report on new experiments in the Keepaway domain, along with performance results designed to be directly comparable with future experimental results. Combined, the new infrastructure and our concrete demonstration of its use in comparative experiments elevate the domain to a machine learning benchmark, suitable for use by researchers across the field.

156 citations

Proceedings ArticleDOI
12 May 2008
TL;DR: This paper introduces Modeling Approximate State Transitions by Exploiting Regression (MASTER), a method for automatically learning a mapping from one task to another through an agent's experience and demonstrates that such learned relationships can significantly improve the speed of a reinforcement learning algorithm in a series of Mountain Car tasks.
Abstract: Recent work in transfer learning has succeeded in making reinforcement learning algorithms more efficient by incorporating knowledge from previous tasks. However, such methods typically must be provided either a full model of the tasks or an explicit relation mapping one task into the other. An autonomous agent may not have access to such high-level information, but would be able to analyze its experience to find similarities between tasks. In this paper we introduce Modeling Approximate State Transitions by Exploiting Regression (MASTER), a method for automatically learning a mapping from one task to another through an agent's experience. We empirically demonstrate that such learned relationships can significantly improve the speed of a reinforcement learning algorithm in a series of Mountain Car tasks. Additionally, we demonstrate that our method may also assist with the difficult problem of task selection for transfer.

135 citations

Proceedings Article
16 Jul 2006
TL;DR: The main feature of the approach is a novel method for automatically constructing effective search heuristics based on the formal game description that is fully implemented and tested in a range of different games.
Abstract: Computer game players are typically designed to play a single game: today's best chess-playing programs cannot play checkers, or even tic-tac-toe. General Game Playing is the problem of designing an agent capable of playing many different previously unseen games. The first AAAI General Game Playing Competition was held at AAAI 2005 in order to promote research in this area. In this article, we survey some of the issues involved in creating a general game playing system and introduce our entry to that event. The main feature of our approach is a novel method for automatically constructing effective search heuristics based on the formal game description. Our agent is fully implemented and tested in a range of different games.

74 citations

Book ChapterDOI
17 Sep 2007
TL;DR: This work introduces a graph-based method for identifying previously encountered games and proves its robustness formally, and describes how the same basic approach can be used to identify similar but non-identical games.
Abstract: A general game player is an agent capable of taking as input a description of a game's rules in a formal language and proceeding to play without any subsequent human input. To do well, an agent should learn from experience with past games and transfer the learned knowledge to new problems. We introduce a graph-based method for identifying previously encountered games and prove its robustness formally. We then describe how the same basic approach can be used to identify similar but non-identical games. We apply this technique to automate domain mapping for value function transfer and speed up reinforcement learning on variants of previously played games. Our approach is fully implemented with empirical results in the general game playing system.

72 citations


Cited by
More filters
Journal ArticleDOI
TL;DR: The relationship between transfer learning and other related machine learning techniques such as domain adaptation, multitask learning and sample selection bias, as well as covariate shift are discussed.
Abstract: A major assumption in many machine learning and data mining algorithms is that the training and future data must be in the same feature space and have the same distribution. However, in many real-world applications, this assumption may not hold. For example, we sometimes have a classification task in one domain of interest, but we only have sufficient training data in another domain of interest, where the latter data may be in a different feature space or follow a different data distribution. In such cases, knowledge transfer, if done successfully, would greatly improve the performance of learning by avoiding much expensive data-labeling efforts. In recent years, transfer learning has emerged as a new learning framework to address this problem. This survey focuses on categorizing and reviewing the current progress on transfer learning for classification, regression, and clustering problems. In this survey, we discuss the relationship between transfer learning and other related machine learning techniques such as domain adaptation, multitask learning and sample selection bias, as well as covariate shift. We also explore some potential future issues in transfer learning research.

18,616 citations

Journal ArticleDOI
TL;DR: This article presents a framework that classifies transfer learning methods in terms of their capabilities and goals, and then uses it to survey the existing literature, as well as to suggest future directions for transfer learning work.
Abstract: The reinforcement learning paradigm is a popular way to address problems that have only limited environmental feedback, rather than correctly labeled examples, as is common in other machine learning contexts. While significant progress has been made to improve learning in a single task, the idea of transfer learning has only recently been applied to reinforcement learning tasks. The core idea of transfer is that experience gained in learning to perform one task can help improve learning performance in a related, but different, task. In this article we present a framework that classifies transfer learning methods in terms of their capabilities and goals, and then use it to survey the existing literature, as well as to suggest future directions for transfer learning work.

1,634 citations

Proceedings Article
19 Jun 2016
TL;DR: In this paper, the authors present a benchmark suite of continuous control tasks, including classic tasks like cart-pole swing-up, tasks with high state and action dimensionality such as 3D humanoid locomotion, and tasks with partial observations.
Abstract: Recently, researchers have made significant progress combining the advances in deep learning for learning feature representations with reinforcement learning. Some notable examples include training agents to play Atari games based on raw pixel data and to acquire advanced manipulation skills using raw sensory inputs. However, it has been difficult to quantify progress in the domain of continuous control due to the lack of a commonly adopted benchmark. In this work, we present a benchmark suite of continuous control tasks, including classic tasks like cart-pole swing-up, tasks with very high state and action dimensionality such as 3D humanoid locomotion, tasks with partial observations, and tasks with hierarchical structure. We report novel findings based on the systematic evaluation of a range of implemented reinforcement learning algorithms. Both the benchmark and reference implementations are released at https://github.com/rllab/rllab in order to facilitate experimental reproducibility and to encourage adoption by other researchers.

1,038 citations

Posted Content
TL;DR: In this article, the authors propose a value-based method that can train decentralised policies in a centralised end-to-end fashion in simulated or laboratory settings, where global state information is available and communication constraints are lifted.
Abstract: In many real-world settings, a team of agents must coordinate their behaviour while acting in a decentralised way. At the same time, it is often possible to train the agents in a centralised fashion in a simulated or laboratory setting, where global state information is available and communication constraints are lifted. Learning joint action-values conditioned on extra state information is an attractive way to exploit centralised learning, but the best strategy for then extracting decentralised policies is unclear. Our solution is QMIX, a novel value-based method that can train decentralised policies in a centralised end-to-end fashion. QMIX employs a network that estimates joint action-values as a complex non-linear combination of per-agent values that condition only on local observations. We structurally enforce that the joint-action value is monotonic in the per-agent values, which allows tractable maximisation of the joint action-value in off-policy learning, and guarantees consistency between the centralised and decentralised policies. We evaluate QMIX on a challenging set of StarCraft II micromanagement tasks, and show that QMIX significantly outperforms existing value-based multi-agent reinforcement learning methods.

693 citations

Journal ArticleDOI
TL;DR: This article surveys imitation learning methods and presents design options in different steps of the learning process, and extensively discusses combining imitation learning approaches using different sources and methods, as well as incorporating other motion learning methods to enhance imitation.
Abstract: Imitation learning techniques aim to mimic human behavior in a given task. An agent (a learning machine) is trained to perform a task from demonstrations by learning a mapping between observations and actions. The idea of teaching by imitation has been around for many years; however, the field is gaining attention recently due to advances in computing and sensing as well as rising demand for intelligent applications. The paradigm of learning by imitation is gaining popularity because it facilitates teaching complex tasks with minimal expert knowledge of the tasks. Generic imitation learning methods could potentially reduce the problem of teaching a task to that of providing demonstrations, without the need for explicit programming or designing reward functions specific to the task. Modern sensors are able to collect and transmit high volumes of data rapidly, and processors with high computational power allow fast processing that maps the sensory data to actions in a timely manner. This opens the door for many potential AI applications that require real-time perception and reaction such as humanoid robots, self-driving vehicles, human computer interaction, and computer games, to name a few. However, specialized algorithms are needed to effectively and robustly learn models as learning by imitation poses its own set of challenges. In this article, we survey imitation learning methods and present design options in different steps of the learning process. We introduce a background and motivation for the field as well as highlight challenges specific to the imitation problem. Methods for designing and evaluating imitation learning tasks are categorized and reviewed. Special attention is given to learning methods in robotics and games as these domains are the most popular in the literature and provide a wide array of problems and methodologies. We extensively discuss combining imitation learning approaches using different sources and methods, as well as incorporating other motion learning methods to enhance imitation. We also discuss the potential impact on industry, present major applications, and highlight current and future research directions.

535 citations