scispace - formally typeset
V

Volodymyr Mnih

Researcher at Google

Publications -  62
Citations -  51796

Volodymyr Mnih is an academic researcher from Google. The author has contributed to research in topics: Reinforcement learning & Artificial neural network. The author has an hindex of 37, co-authored 60 publications receiving 38272 citations. Previous affiliations of Volodymyr Mnih include University of Toronto & University of Alberta.

Papers
More filters
Posted Content

Learning by Playing - Solving Sparse Reward Tasks from Scratch

TL;DR: The key idea behind the method is that active (learned) scheduling and execution of auxiliary policies allows the agent to efficiently explore its environment - enabling it to excel at sparse reward RL.
Posted Content

Strategic Attentive Writer for Learning Macro-Actions

TL;DR: In this article, a deep recurrent neural network architecture that learns to build implicit plans in an end-to-end manner by purely interacting with an environment in reinforcement learning setting is presented.
Posted Content

Unsupervised Control Through Non-Parametric Discriminative Rewards

TL;DR: An unsupervised learning algorithm to train agents to achieve perceptually-specified goals using only a stream of observations and actions, which leads to a co-operative game and a learned reward function that reflects similarity in controllable aspects of the environment instead of distance in the space of observations.
Proceedings Article

Conditional restricted Boltzmann machines for structured output prediction

TL;DR: In this article, the authors argue that Contrastive Divergence-based learning may not be suitable for training conditional restricted Boltzmann machines (CRBMs) for structured output prediction.
Posted Content

Combining policy gradient and Q-learning

TL;DR: A new technique is described that combines policy gradient with off-policy Q-learning, drawing experience from a replay buffer, and establishing an equivalency between action-value fitting techniques and actor-critic algorithms, showing that regularized policy gradient techniques can be interpreted as advantage function learning algorithms.