Showing papers by "Michael L. Littman published in 2013"

PDF

Open Access

Posted Content•

Incremental Pruning: A Simple, Fast, Exact Method for Partially Observable Markov Decision Processes

[...]

Anthony R. Cassandra¹, Michael L. Littman², Nevin L. Zhang•Institutions (2)

06 Feb 2013-arXiv: Artificial Intelligence

TL;DR: In this article, the authors examined variations of the "incremental pruning" method for solving POMDPs and compared them to earlier algorithms from theoretical and empirical perspectives, concluding that incremental pruning is presently the most efficient exact method.

...read moreread less

Abstract: Most exact algorithms for general partially observable Markov decision processes (POMDPs) use a form of dynamic programming in which a piecewise-linear and convex representation of one value function is transformed into another. We examine variations of the "incremental pruning" method for solving this problem and compare them to earlier algorithms from theoretical and empirical perspectives. We find that incremental pruning is presently the most efficient exact method for solving POMDPs.

...read moreread less

52 citations

Posted Content•

On the Complexity of Solving Markov Decision Problems

[...]

Michael L. Littman¹, Thomas Dean¹, Leslie Pack Kaelbling¹•Institutions (1)

Brown University¹

20 Feb 2013-arXiv: Artificial Intelligence

TL;DR: In this paper, the authors summarize results regarding the complexity of solving MDPs and the running time of MDP solution algorithms and argue that, although MDP can be solved efficiently in theory, more study is needed to reveal practical algorithms for solving large problems quickly.

...read moreread less

Abstract: Markov decision problems (MDPs) provide the foundations for a number of problems of interest to AI researchers studying automated planning and reinforcement learning. In this paper, we summarize results regarding the complexity of solving MDPs and the running time of MDP solution algorithms. We argue that, although MDPs can be solved efficiently in theory, more study is needed to reveal practical algorithms for solving large problems quickly. To encourage future research, we sketch some alternative methods of analysis that rely on the structure of MDPs.

...read moreread less

44 citations

Proceedings Article•

Open-loop planning in large-scale stochastic domains

[...]

Ari Weinstein¹, Michael L. Littman²•Institutions (2)

Rutgers University¹, Brown University²

14 Jul 2013

TL;DR: This work brings to bear a concrete method that satisfies all requirements of a receding-horizon open-loop planner that employs cross-entropy optimization for policy construction and empirically demonstrates near-optimal decisions in a small domain and effective locomotion in several challenging humanoid control tasks.

...read moreread less

Abstract: We focus on effective sample-based planning in the face of underactuation, high-dimensionality, drift, discrete system changes, and stochasticity. These are hallmark challenges for important problems, such as humanoid locomotion. In order to ensure broad applicability, we assume domain expertise is minimal and limited to a generative model. In order to make the method responsive, computational costs that scale linearly with the amount of samples taken from the generative model are required. We bring to bear a concrete method that satisfies all these requirements; it is a receding-horizon open-loop planner that employs cross-entropy optimization for policy construction. In simulation, we empirically demonstrate near-optimal decisions in a small domain and effective locomotion in several challenging humanoid control tasks.

...read moreread less

42 citations

Proceedings Article•

The Cross-Entropy Method Optimizes for Quantiles

[...]

Sergiu Goschin¹, Ari Weinstein¹, Michael L. Littman²•Institutions (2)

Rutgers University¹, Brown University²

16 Jun 2013

TL;DR: This work provides a variant of CE (Proportional CE) that effectively optimizes the expected value and can improve solution quality using variants of established noisy environments.

...read moreread less

Abstract: Cross-entropy optimization (CE) has proven to be a powerful tool for search in control environments. In the basic scheme, a distribution over proposed solutions is repeatedly adapted by evaluating a sample of solutions and refocusing the distribution on a percentage of those with the highest scores. We show that, in the kind of noisy evaluation environments that are common in decision-making domains, this percentage-based refocusing does not optimize the expected utility of solutions, but instead a quantile metric. We provide a variant of CE (Proportional CE) that effectively optimizes the expected value. We show using variants of established noisy environments that Proportional CE can be used in place of CE and can improve solution quality.

...read moreread less

19 citations

Proceedings Article•

Coco-Q: Learning in Stochastic Games with Side Payments

[...]

Eric Sodomka¹, Elizabeth Hilliard¹, Michael L. Littman¹, Amy Greenwald¹•Institutions (1)

Brown University¹

16 Jun 2013

TL;DR: It is shown that COCO values can also be defined for stochastic games and can be learned using a simple variant of Q-learning that is provably convergent.

...read moreread less

Abstract: COCO ("cooperative/competitive") values are a solution concept for two-player normal-form games with transferable utility, when binding agreements and side payments between players are possible. In this paper, we show that COCO values can also be defined for stochastic games and can be learned using a simple variant of Q-learning that is provably convergent. We provide a set of examples showing how the strategies learned by the COCO-Q algorithm relate to those learned by existing multiagent Q-learning algorithms.

...read moreread less

17 citations

Proceedings Article•

An ensemble of linearly combined reinforcement-learning agents

[...]

Vukosi Marivate¹, Michael L. Littman²•Institutions (2)

Rutgers University¹, Brown University²

01 Jan 2013

TL;DR: This work shows how more robust learning across environments is possible by adopting an ensemble approach to reinforcement learning, which learns a weighted linear combination of Q-values from multiple independent learning algorithms.

...read moreread less

Abstract: Reinforcement-learning (RL) algorithms are often tweaked and tuned to specific environments when applied, calling into question whether learning can truly be considered autonomous in these cases. In this work, we show how more robust learning across environments is possible by adopting an ensemble approach to reinforcement learning. Our approach learns a weighted linear combination of Q-values from multiple independent learning algorithms. In our evaluations in generalized RL environments, we find that the algorithm compares favorably to the best tuned algorithm. Our work provides a promising basis for further study into the use of ensemble methods in RL.

...read moreread less

15 citations

Posted Content•

Graphical Models for Game Theory

[...]

Michael Kearns, Michael L. Littman¹, Satinder Singh•Institutions (1)

AT&T Labs¹

10 Jan 2013-arXiv: Computer Science and Game Theory

TL;DR: In this article, an n-player game is given by an undirected graph on n nodes and a set of n local matrices, where the payoff to player i is determined entirely by the actions of player i and his neighbors in the graph.

...read moreread less

Abstract: In this work, we introduce graphical modelsfor multi-player game theory, and give powerful algorithms for computing their Nash equilibria in certain cases. An n-player game is given by an undirected graph on n nodes and a set of n local matrices. The interpretation is that the payoff to player i is determined entirely by the actions of player i and his neighbors in the graph, and thus the payoff matrix to player i is indexed only by these players. We thus view the global n-player game as being composed of interacting local games, each involving many fewer players. Each player's action may have global impact, but it occurs through the propagation of local influences.Our main technical result is an efficient algorithm for computing Nash equilibria when the underlying graph is a tree (or can be turned into a tree with few node mergings). The algorithm runs in time polynomial in the size of the representation (the graph and theassociated local game matrices), and comes in two related but distinct flavors. The first version involves an approximation step, and computes a representation of all approximate Nash equilibria (of which there may be an exponential number in general). The second version allows the exact computation of Nash equilibria at the expense of weakened complexity bounds. The algorithm requires only local message-passing between nodes (and thus can be implemented by the players themselves in a distributed manner). Despite an analogy to inference in Bayes nets that we develop, the analysis of our algorithm is more involved than that for the polytree algorithm in, owing partially to the fact that we must either compute, or select from, an exponential number of potential solutions. We discuss a number of extensions, such as the computation of equilibria with desirable global properties (e.g. maximizing global return), and directions for further research.

...read moreread less

12 citations