M
Mohammad Gheshlaghi Azar
Researcher at Northwestern University
Publications - 53
Citations - 7837
Mohammad Gheshlaghi Azar is an academic researcher from Northwestern University. The author has contributed to research in topics: Reinforcement learning & Computer science. The author has an hindex of 26, co-authored 48 publications receiving 4182 citations. Previous affiliations of Mohammad Gheshlaghi Azar include Radboud University Nijmegen & Google.
Papers
More filters
Posted Content
Bootstrap Your Own Latent: A New Approach to Self-Supervised Learning
Jean-Bastien Grill,Florian Strub,Florent Altché,Corentin Tallec,Pierre H. Richemond,Elena Buchatskaya,Carl Doersch,Bernardo Avila Pires,Zhaohan Daniel Guo,Mohammad Gheshlaghi Azar,Bilal Piot,Koray Kavukcuoglu,Rémi Munos,Michal Valko +13 more
TL;DR: This work introduces Bootstrap Your Own Latent (BYOL), a new approach to self-supervised image representation learning that performs on par or better than the current state of the art on both transfer and semi- supervised benchmarks.
Posted Content
Rainbow: Combining Improvements in Deep Reinforcement Learning
Matteo Hessel,Joseph Modayil,Hado van Hasselt,Tom Schaul,Georg Ostrovski,Will Dabney,Dan Horgan,Bilal Piot,Mohammad Gheshlaghi Azar,David Silver +9 more
TL;DR: This paper examines six extensions to the DQN algorithm and empirically studies their combination, showing that the combination provides state-of-the-art performance on the Atari 2600 benchmark, both in terms of data efficiency and final performance.
Proceedings Article
Rainbow: Combining Improvements in Deep Reinforcement Learning
Matteo Hessel,Joseph Modayil,Hado van Hasselt,Tom Schaul,Georg Ostrovski,Will Dabney,Dan Horgan,Bilal Piot,Mohammad Gheshlaghi Azar,David Silver +9 more
TL;DR: In this article, the authors examined six extensions to the DQN algorithm and empirically studied their combination, showing that the combination provided state-of-the-art performance on the Atari 2600 benchmark.
Proceedings Article
Minimax regret bounds for reinforcement learning
TL;DR: The problem of provably optimal exploration in reinforcement learning for finite horizon MDPs is considered, and an optimistic modification to value iteration achieves a regret bound of $\tilde{O}( \sqrt{HSAT} + H^2S^2A+H\sqrt {T})$ where $H$ is the time horizon, $S$ the number of states, $A$the number of actions and $T$ thenumber of time-steps.
Proceedings Article
Noisy Networks For Exploration
Meire Fortunato,Mohammad Gheshlaghi Azar,Bilal Piot,Jacob Menick,Ian Osband,Alex Graves,Vlad Mnih,Rémi Munos,Demis Hassabis,Olivier Pietquin,Charles Blundell,Shane Legg +11 more
TL;DR: It is found that replacing the conventional exploration heuristics for A3C, DQN and dueling agents with NoisyNet yields substantially higher scores for a wide range of Atari games, in some cases advancing the agent from sub to super-human performance.