P
Peter Sunehag
Researcher at Google
Publications - 65
Citations - 1924
Peter Sunehag is an academic researcher from Google. The author has contributed to research in topics: Reinforcement learning & Markov decision process. The author has an hindex of 15, co-authored 62 publications receiving 1348 citations. Previous affiliations of Peter Sunehag include NICTA & Australian National University.
Papers
More filters
Proceedings Article
Value-Decomposition Networks For Cooperative Multi-Agent Learning Based On Team Reward
Peter Sunehag,Guy Lever,Audrunas Gruslys,Wojciech Marian Czarnecki,Vinicius Zambaldi,Max Jaderberg,Marc Lanctot,Nicolas Sonnerat,Joel Z. Leibo,Karl Tuyls,Thore Graepel +10 more
TL;DR: This work addresses the problem of cooperative multi-agent reinforcement learning with a single joint reward signal by training individual agents with a novel value decomposition network architecture, which learns to decompose the team value function into agent-wise value functions.
Posted Content
Reinforcement Learning in Large Discrete Action Spaces.
TL;DR: This paper leverages prior information about the actions to embed them in a continuous space upon which it can generalize, and uses approximate nearest-neighbor methods to allow reinforcement learning methods to be applied to large-scale learning problems previously intractable with current methods.
Posted Content
Value-Decomposition Networks For Cooperative Multi-Agent Learning
Peter Sunehag,Guy Lever,Audrunas Gruslys,Wojciech Marian Czarnecki,Vinicius Zambaldi,Max Jaderberg,Marc Lanctot,Nicolas Sonnerat,Joel Z. Leibo,Karl Tuyls,Thore Graepel +10 more
TL;DR: In this paper, a value decomposition network is proposed to decompose the team value function into agent-wise value functions, which leads to superior results when combined with weight sharing, role information and information channels.
Proceedings Article
The Sample-Complexity of General Reinforcement Learning
TL;DR: A new algorithm for general reinforcement learning where the true environment is known to belong to a finite class of N arbitrary models and compactness is a key criterion for determining the existence of uniform sample-complexity bounds is presented.
Posted Content
Deep Reinforcement Learning with Attention for Slate Markov Decision Processes with High-Dimensional States and Actions
TL;DR: The new agent's superiority over agents that either ignore the combinatorial or sequential long-term value aspect is demonstrated on a range of environments with dynamics from a real-world recommendation system.