scispace - formally typeset
Search or ask a question

Showing papers by "Michael L. Littman published in 2013"


Posted Content
TL;DR: In this article, the authors examined variations of the "incremental pruning" method for solving POMDPs and compared them to earlier algorithms from theoretical and empirical perspectives, concluding that incremental pruning is presently the most efficient exact method.
Abstract: Most exact algorithms for general partially observable Markov decision processes (POMDPs) use a form of dynamic programming in which a piecewise-linear and convex representation of one value function is transformed into another. We examine variations of the "incremental pruning" method for solving this problem and compare them to earlier algorithms from theoretical and empirical perspectives. We find that incremental pruning is presently the most efficient exact method for solving POMDPs.

52 citations


Posted Content
TL;DR: In this paper, the authors summarize results regarding the complexity of solving MDPs and the running time of MDP solution algorithms and argue that, although MDP can be solved efficiently in theory, more study is needed to reveal practical algorithms for solving large problems quickly.
Abstract: Markov decision problems (MDPs) provide the foundations for a number of problems of interest to AI researchers studying automated planning and reinforcement learning. In this paper, we summarize results regarding the complexity of solving MDPs and the running time of MDP solution algorithms. We argue that, although MDPs can be solved efficiently in theory, more study is needed to reveal practical algorithms for solving large problems quickly. To encourage future research, we sketch some alternative methods of analysis that rely on the structure of MDPs.

44 citations


Proceedings Article
14 Jul 2013
TL;DR: This work brings to bear a concrete method that satisfies all requirements of a receding-horizon open-loop planner that employs cross-entropy optimization for policy construction and empirically demonstrates near-optimal decisions in a small domain and effective locomotion in several challenging humanoid control tasks.
Abstract: We focus on effective sample-based planning in the face of underactuation, high-dimensionality, drift, discrete system changes, and stochasticity. These are hallmark challenges for important problems, such as humanoid locomotion. In order to ensure broad applicability, we assume domain expertise is minimal and limited to a generative model. In order to make the method responsive, computational costs that scale linearly with the amount of samples taken from the generative model are required. We bring to bear a concrete method that satisfies all these requirements; it is a receding-horizon open-loop planner that employs cross-entropy optimization for policy construction. In simulation, we empirically demonstrate near-optimal decisions in a small domain and effective locomotion in several challenging humanoid control tasks.

42 citations


Proceedings Article
16 Jun 2013
TL;DR: This work provides a variant of CE (Proportional CE) that effectively optimizes the expected value and can improve solution quality using variants of established noisy environments.
Abstract: Cross-entropy optimization (CE) has proven to be a powerful tool for search in control environments. In the basic scheme, a distribution over proposed solutions is repeatedly adapted by evaluating a sample of solutions and refocusing the distribution on a percentage of those with the highest scores. We show that, in the kind of noisy evaluation environments that are common in decision-making domains, this percentage-based refocusing does not optimize the expected utility of solutions, but instead a quantile metric. We provide a variant of CE (Proportional CE) that effectively optimizes the expected value. We show using variants of established noisy environments that Proportional CE can be used in place of CE and can improve solution quality.

19 citations


Proceedings Article
16 Jun 2013
TL;DR: It is shown that COCO values can also be defined for stochastic games and can be learned using a simple variant of Q-learning that is provably convergent.
Abstract: COCO ("cooperative/competitive") values are a solution concept for two-player normal-form games with transferable utility, when binding agreements and side payments between players are possible. In this paper, we show that COCO values can also be defined for stochastic games and can be learned using a simple variant of Q-learning that is provably convergent. We provide a set of examples showing how the strategies learned by the COCO-Q algorithm relate to those learned by existing multiagent Q-learning algorithms.

17 citations


Proceedings Article
01 Jan 2013
TL;DR: This work shows how more robust learning across environments is possible by adopting an ensemble approach to reinforcement learning, which learns a weighted linear combination of Q-values from multiple independent learning algorithms.
Abstract: Reinforcement-learning (RL) algorithms are often tweaked and tuned to specific environments when applied, calling into question whether learning can truly be considered autonomous in these cases. In this work, we show how more robust learning across environments is possible by adopting an ensemble approach to reinforcement learning. Our approach learns a weighted linear combination of Q-values from multiple independent learning algorithms. In our evaluations in generalized RL environments, we find that the algorithm compares favorably to the best tuned algorithm. Our work provides a promising basis for further study into the use of ensemble methods in RL.

15 citations


Posted Content
TL;DR: In this article, an n-player game is given by an undirected graph on n nodes and a set of n local matrices, where the payoff to player i is determined entirely by the actions of player i and his neighbors in the graph.
Abstract: In this work, we introduce graphical modelsfor multi-player game theory, and give powerful algorithms for computing their Nash equilibria in certain cases. An n-player game is given by an undirected graph on n nodes and a set of n local matrices. The interpretation is that the payoff to player i is determined entirely by the actions of player i and his neighbors in the graph, and thus the payoff matrix to player i is indexed only by these players. We thus view the global n-player game as being composed of interacting local games, each involving many fewer players. Each player's action may have global impact, but it occurs through the propagation of local influences.Our main technical result is an efficient algorithm for computing Nash equilibria when the underlying graph is a tree (or can be turned into a tree with few node mergings). The algorithm runs in time polynomial in the size of the representation (the graph and theassociated local game matrices), and comes in two related but distinct flavors. The first version involves an approximation step, and computes a representation of all approximate Nash equilibria (of which there may be an exponential number in general). The second version allows the exact computation of Nash equilibria at the expense of weakened complexity bounds. The algorithm requires only local message-passing between nodes (and thus can be implemented by the players themselves in a distributed manner). Despite an analogy to inference in Bayes nets that we develop, the analysis of our algorithm is more involved than that for the polytree algorithm in, owing partially to the fact that we must either compute, or select from, an exponential number of potential solutions. We discuss a number of extensions, such as the computation of equilibria with desirable global properties (e.g. maximizing global return), and directions for further research.

12 citations