scispace - formally typeset
M

Marc Lanctot

Researcher at Google

Publications -  113
Citations -  27573

Marc Lanctot is an academic researcher from Google. The author has contributed to research in topics: Nash equilibrium & Reinforcement learning. The author has an hindex of 36, co-authored 100 publications receiving 20154 citations. Previous affiliations of Marc Lanctot include University of Alberta & Maastricht University.

Papers
More filters
Journal ArticleDOI

Mastering the game of Go with deep neural networks and tree search

TL;DR: Using this search algorithm, the program AlphaGo achieved a 99.8% winning rate against other Go programs, and defeated the human European Go champion by 5 games to 0.5, the first time that a computer program has defeated a human professional player in the full-sized game of Go.
Journal ArticleDOI

A general reinforcement learning algorithm that masters chess, shogi, and Go through self-play.

TL;DR: This paper generalizes the AlphaZero approach into a single AlphaZero algorithm that can achieve superhuman performance in many challenging games, and convincingly defeated a world champion program in the games of chess and shogi (Japanese chess), as well as Go.
Posted Content

Dueling Network Architectures for Deep Reinforcement Learning

TL;DR: This paper presents a new neural network architecture for model-free reinforcement learning that leads to better policy evaluation in the presence of many similar-valued actions and enables the RL agent to outperform the state-of-the-art on the Atari 2600 domain.
Proceedings Article

Dueling network architectures for deep reinforcement learning

TL;DR: In this paper, a dueling network is proposed to represent two separate estimators for the state value function and the state-dependent advantage function, which leads to better policy evaluation in the presence of many similar-valued actions.
Posted Content

Mastering Chess and Shogi by Self-Play with a General Reinforcement Learning Algorithm

TL;DR: This paper generalises the approach into a single AlphaZero algorithm that can achieve, tabula rasa, superhuman performance in many challenging domains, and convincingly defeated a world-champion program in each case.