Decentralized Q-Learning for Stochastic Teams and Games

doi:10.1109/TAC.2016.2598476

Citations

PDF

Open Access

More filters

Book Chapter•DOI•

Multi-Agent Reinforcement Learning: A Selective Overview of Theories and Algorithms

[...]

Kaiqing Zhang¹, Zhuoran Yang², Tamer Basar¹•Institutions (2)

University of Illinois at Urbana–Champaign¹, Princeton University²

29 Apr 2021-arXiv: Learning

TL;DR: This chapter reviews the theoretical results of MARL algorithms mainly within two representative frameworks, Markov/stochastic games and extensive-form games, in accordance with the types of tasks they address, i.e., fully cooperative, fully competitive, and a mix of the two.

...read moreread less

Abstract: Recent years have witnessed significant advances in reinforcement learning (RL), which has registered tremendous success in solving various sequential decision-making problems in machine learning. Most of the successful RL applications, e.g., the games of Go and Poker, robotics, and autonomous driving, involve the participation of more than one single agent, which naturally fall into the realm of multi-agent RL (MARL), a domain with a relatively long history, and has recently re-emerged due to advances in single-agent RL techniques. Though empirically successful, theoretical foundations for MARL are relatively lacking in the literature. In this chapter, we provide a selective overview of MARL, with focus on algorithms backed by theoretical analysis. More specifically, we review the theoretical results of MARL algorithms mainly within two representative frameworks, Markov/stochastic games and extensive-form games, in accordance with the types of tasks they address, i.e., fully cooperative, fully competitive, and a mix of the two. We also introduce several significant but challenging applications of these algorithms. Orthogonal to the existing reviews on MARL, we highlight several new angles and taxonomies of MARL theory, including learning in extensive-form games, decentralized MARL with networked agents, MARL in the mean-field regime, (non-)convergence of policy-based methods for learning in games, etc. Some of the new angles extrapolate from our own research endeavors and interests. Our overall goal with this chapter is, beyond providing an assessment of the current state of the field on the mark, to identify fruitful future research directions on theoretical studies of MARL. We expect this chapter to serve as continuing stimulus for researchers interested in working on this exciting while challenging topic.

...read moreread less

692 citations

Posted Content•

An Overview of Multi-Agent Reinforcement Learning from Game Theoretical Perspective

[...]

Yaodong Yang, Jun Wang

01 Nov 2020-arXiv: Multiagent Systems

TL;DR: This work provides a self-contained assessment of the current state-of-the-art MARL techniques from a game theoretical perspective and expects this work to serve as a stepping stone for both new researchers who are about to enter this fast-growing domain and existing domain experts who want to obtain a panoramic view and identify new directions based on recent advances.

...read moreread less

Abstract: Following the remarkable success of the AlphaGO series, 2019 was a booming year that witnessed significant advances in multi-agent reinforcement learning (MARL) techniques. MARL corresponds to the learning problem in a multi-agent system in which multiple agents learn simultaneously. It is an interdisciplinary domain with a long history that includes game theory, machine learning, stochastic control, psychology, and optimisation. Although MARL has achieved considerable empirical success in solving real-world games, there is a lack of a self-contained overview in the literature that elaborates the game theoretical foundations of modern MARL methods and summarises the recent advances. In fact, the majority of existing surveys are outdated and do not fully cover the recent developments since 2010. In this work, we provide a monograph on MARL that covers both the fundamentals and the latest developments in the research frontier. The goal of our monograph is to provide a self-contained assessment of the current state-of-the-art MARL techniques from a game theoretical perspective. We expect this work to serve as a stepping stone for both new researchers who are about to enter this fast-growing domain and existing domain experts who want to obtain a panoramic view and identify new directions based on recent advances.

...read moreread less

103 citations

Proceedings Article•DOI•

Networked Multi-Agent Reinforcement Learning in Continuous Spaces

[...]

Kaiqing Zhang¹, Zhuoran Yang², Tamer Basar¹•Institutions (2)

University of Illinois at Urbana–Champaign¹, Princeton University²

01 Dec 2018

TL;DR: This paper proposes a fully decentralized actor-critic algorithm that only relies on neighbor-to-neighbor communications among agents in a networked multi-agent reinforcement learning setting, and adopts the newly proposed expected policy gradient to reduce the variance of the gradient estimate.

...read moreread less

Abstract: Many real-world tasks on practical control systems involve the learning and decision-making of multiple agents, under limited communications and observations. In this paper, we study the problem of networked multi-agent reinforcement learning (MARL), where multiple agents perform reinforcement learning in a common environment, and are able to exchange information via a possibly time-varying communication network. In particular, we focus on a collaborative MARL setting where each agent has individual reward functions, and the objective of all the agents is to maximize the network-wide averaged long-term return. To this end, we propose a fully decentralized actor-critic algorithm that only relies on neighbor-to-neighbor communications among agents. To promote the use of the algorithm on practical control systems, we focus on the setting with continuous state and action spaces, and adopt the newly proposed expected policy gradient to reduce the variance of the gradient estimate. We provide convergence guarantees for the algorithm when linear function approximation is employed, and corroborate our theoretical results via simulations.

...read moreread less

93 citations

Proceedings Article•

Independent Policy Gradient Methods for Competitive Reinforcement Learning

[...]

Constantinos Daskalakis¹, Dylan J. Foster¹, Noah Golowich¹•Institutions (1)

Massachusetts Institute of Technology¹

01 Jan 2020

TL;DR: It is shown that if both players run policy gradient methods in tandem, their policies will converge to a min-max equilibrium of the game, as long as their learning rates follow a two-timescale rule.

...read moreread less

Abstract: We obtain global, non-asymptotic convergence guarantees for independent learning algorithms in competitive reinforcement learning settings with two agents (i.e., zero-sum stochastic games). We consider an episodic setting where in each episode, each player independently selects a policy and observes only their own actions and rewards, along with the state. We show that if both players run policy gradient methods in tandem, their policies will converge to a min-max equilibrium of the game, as long as their learning rates follow a two-timescale rule (which is necessary). To the best of our knowledge, this constitutes the first finite-sample convergence result for independent policy gradient methods in competitive RL; prior work has largely focused on centralized, coordinated procedures for equilibrium computation.

...read moreread less

85 citations

Posted Content•

Fully Decentralized Multi-Agent Reinforcement Learning with Networked Agents

[...]

Kaiqing Zhang¹, Zhuoran Yang², Han Liu², Tong Zhang³, Tamer Basar¹ - Show less +1 more•Institutions (3)

University of Illinois at Urbana–Champaign¹, Princeton University², Tencent³

23 Feb 2018-arXiv: Learning

TL;DR: In this paper, the authors consider the problem of fully decentralized multi-agent reinforcement learning (MARL), where the agents are located at the nodes of a time-varying communication network.

...read moreread less

Abstract: We consider the problem of \emph{fully decentralized} multi-agent reinforcement learning (MARL), where the agents are located at the nodes of a time-varying communication network. Specifically, we assume that the reward functions of the agents might correspond to different tasks, and are only known to the corresponding agent. Moreover, each agent makes individual decisions based on both the information observed locally and the messages received from its neighbors over the network. Within this setting, the collective goal of the agents is to maximize the globally averaged return over the network through exchanging information with their neighbors. To this end, we propose two decentralized actor-critic algorithms with function approximation, which are applicable to large-scale MARL problems where both the number of states and the number of agents are massively large. Under the decentralized structure, the actor step is performed individually by each agent with no need to infer the policies of others. For the critic step, we propose a consensus update via communication over the network. Our algorithms are fully incremental and can be implemented in an online fashion. Convergence analyses of the algorithms are provided when the value functions are approximated within the class of linear functions. Extensive simulation results with both linear and nonlinear function approximations are presented to validate the proposed algorithms. Our work appears to be the first study of fully decentralized MARL algorithms for networked agents with function approximation, with provable convergence guarantees.

...read moreread less

77 citations

Collapse

Decentralized Q-Learning for Stochastic Teams and Games

Citations

References

Related Papers (5)

Trending Questions (1)