Top 28 papers published by Oriol Vinyals from Google in 2017

Proceedings Article•

Neural Message Passing for Quantum Chemistry

[...]

Justin Gilmer¹, Samuel S. Schoenholz², Patrick Riley¹, Oriol Vinyals¹, George E. Dahl¹ - Show less +1 more•Institutions (2)

Google¹, University of Pennsylvania²

04 Apr 2017

TL;DR: The Message Passing Neural Networks (MPNNs) as mentioned in this paper are a generalization of the message passing algorithm and aggregation procedure to compute a function of their entire input graph, and have been shown to achieve state-of-the-art results on an important molecular property prediction benchmark.

...read moreread less

Abstract: Supervised learning on molecules has incredible potential to be useful in chemistry, drug discovery, and materials science. Luckily, several promising and closely related neural network models invariant to molecular symmetries have already been described in the literature. These models learn a message passing algorithm and aggregation procedure to compute a function of their entire input graph. At this point, the next step is to find a particularly effective variant of this general approach and apply it to chemical prediction benchmarks until we either solve them or reach the limits of the approach. In this paper, we reformulate existing models into a single common framework we call Message Passing Neural Networks (MPNNs) and explore additional novel variations within this framework. Using MPNNs we demonstrate state of the art results on an important molecular property prediction benchmark; these results are strong enough that we believe future work should focus on datasets with larger molecules or more accurate ground truth labels.

...read moreread less

3,219 citations

Posted Content•

Neural Message Passing for Quantum Chemistry

[...]

Justin Gilmer¹, Samuel S. Schoenholz², Patrick Riley¹, Oriol Vinyals¹, George E. Dahl¹ - Show less +1 more•Institutions (2)

Google¹, University of Pennsylvania²

04 Apr 2017-arXiv: Learning

TL;DR: Using MPNNs, state of the art results on an important molecular property prediction benchmark are demonstrated and it is believed future work should focus on datasets with larger molecules or more accurate ground truth labels.

...read moreread less

Abstract: Supervised learning on molecules has incredible potential to be useful in chemistry, drug discovery, and materials science. Luckily, several promising and closely related neural network models invariant to molecular symmetries have already been described in the literature. These models learn a message passing algorithm and aggregation procedure to compute a function of their entire input graph. At this point, the next step is to find a particularly effective variant of this general approach and apply it to chemical prediction benchmarks until we either solve them or reach the limits of the approach. In this paper, we reformulate existing models into a single common framework we call Message Passing Neural Networks (MPNNs) and explore additional novel variations within this framework. Using MPNNs we demonstrate state of the art results on an important molecular property prediction benchmark; these results are strong enough that we believe future work should focus on datasets with larger molecules or more accurate ground truth labels.

...read moreread less

2,184 citations

Proceedings Article•

Neural Discrete Representation Learning

[...]

Aaron van den Oord¹, Oriol Vinyals¹, Koray Kavukcuoglu¹•Institutions (1)

Google¹

02 Nov 2017

TL;DR: The Vector Quantised-Variational AutoEncoder (VQ-VAE) as discussed by the authors is a generative model that learns a discrete latent representation by using vector quantization.

...read moreread less

Abstract: Learning useful representations without supervision remains a key challenge in machine learning. In this paper, we propose a simple yet powerful generative model that learns such discrete representations. Our model, the Vector Quantised-Variational AutoEncoder (VQ-VAE), differs from VAEs in two key ways: the encoder network outputs discrete, rather than continuous, codes; and the prior is learnt rather than static. In order to learn a discrete latent representation, we incorporate ideas from vector quantisation (VQ). Using the VQ method allows the model to circumvent issues of ``posterior collapse'' -— where the latents are ignored when they are paired with a powerful autoregressive decoder -— typically observed in the VAE framework. Pairing these representations with an autoregressive prior, the model can generate high quality images, videos, and speech as well as doing high quality speaker conversion and unsupervised learning of phonemes, providing further evidence of the utility of the learnt representations.

...read moreread less

1,963 citations

Proceedings Article•

Understanding deep learning requires rethinking generalization.

[...]

Chiyuan Zhang¹, Samy Bengio², Moritz Hardt², Benjamin Recht³, Oriol Vinyals² - Show less +1 more•Institutions (3)

Massachusetts Institute of Technology¹, Google², University of California³

01 Jan 2017

TL;DR: This article showed that deep neural networks can fit a random labeling of the training data, and that this phenomenon is qualitatively unaffected by explicit regularization, and occurs even if the true images are replaced by completely unstructured random noise.

...read moreread less

Abstract: Despite their massive size, successful deep artificial neural networks can exhibit a remarkably small difference between training and test performance. Conventional wisdom attributes small generalization error either to properties of the model family, or to the regularization techniques used during training. Through extensive systematic experiments, we show how these traditional approaches fail to explain why large neural networks generalize well in practice. Specifically, our experiments establish that state-of-the-art convolutional networks for image classification trained with stochastic gradient methods easily fit a random labeling of the training data. This phenomenon is qualitatively unaffected by explicit regularization, and occurs even if we replace the true images by completely unstructured random noise. We corroborate these experimental findings with a theoretical construction showing that simple depth two neural networks already have perfect finite sample expressivity as soon as the number of parameters exceeds the number of data points as it usually does in practice. We interpret our experimental findings by comparison with traditional models.

...read moreread less

1,854 citations

Journal Article•DOI•

Show and Tell: Lessons Learned from the 2015 MSCOCO Image Captioning Challenge

[...]

Oriol Vinyals¹, Alexander Toshev¹, Samy Bengio¹, Dumitru Erhan¹•Institutions (1)

Google¹

01 Apr 2017-IEEE Transactions on Pattern Analysis and Machine Intelligence

TL;DR: A generative model based on a deep recurrent architecture that combines recent advances in computer vision and machine translation and that can be used to generate natural sentences describing an image is presented.

...read moreread less

Abstract: Automatically describing the content of an image is a fundamental problem in artificial intelligence that connects computer vision and natural language processing. In this paper, we present a generative model based on a deep recurrent architecture that combines recent advances in computer vision and machine translation and that can be used to generate natural sentences describing an image. The model is trained to maximize the likelihood of the target description sentence given the training image. Experiments on several datasets show the accuracy of the model and the fluency of the language it learns solely from image descriptions. Our model is often quite accurate, which we verify both qualitatively and quantitatively. Finally, given the recent surge of interest in this task, a competition was organized in 2015 using the newly released COCO dataset. We describe and analyze the various improvements we applied to our own baseline and show the resulting performance in the competition, which we won ex-aequo with a team from Microsoft Research.

...read moreread less

848 citations

Posted Content•

StarCraft II: A New Challenge for Reinforcement Learning

[...]

16 Aug 2017-arXiv: Learning

TL;DR: This paper introduces SC2LE (StarCraft II Learning Environment), a reinforcement learning environment based on the StarCraft II game that offers a new and challenging environment for exploring deep reinforcement learning algorithms and architectures and gives initial baseline results for neural networks trained from this data to predict game outcomes and player actions.

...read moreread less

Abstract: This paper introduces SC2LE (StarCraft II Learning Environment), a reinforcement learning environment based on the StarCraft II game. This domain poses a new grand challenge for reinforcement learning, representing a more difficult class of problems than considered in most prior work. It is a multi-agent problem with multiple players interacting; there is imperfect information due to a partially observed map; it has a large action space involving the selection and control of hundreds of units; it has a large state space that must be observed solely from raw input feature planes; and it has delayed credit assignment requiring long-term strategies over thousands of steps. We describe the observation, action, and reward specification for the StarCraft II domain and provide an open source Python-based interface for communicating with the game engine. In addition to the main game maps, we provide a suite of mini-games focusing on different elements of StarCraft II gameplay. For the main game maps, we also provide an accompanying dataset of game replay data from human expert players. We give initial baseline results for neural networks trained from this data to predict game outcomes and player actions. Finally, we present initial baseline results for canonical deep reinforcement learning agents applied to the StarCraft II domain. On the mini-games, these agents learn to achieve a level of play that is comparable to a novice player. However, when trained on the main game, these agents are unable to make significant progress. Thus, SC2LE offers a new and challenging environment for exploring deep reinforcement learning algorithms and architectures.

...read moreread less

683 citations

Proceedings Article•DOI•

Lip Reading Sentences in the Wild

[...]

Joon Son Chung¹, Andrew W. Senior², Oriol Vinyals², Andrew Zisserman¹•Institutions (2)

University of Oxford¹, Google²

21 Jul 2017

TL;DR: The WLAS model trained on the LRS dataset surpasses the performance of all previous work on standard lip reading benchmark datasets, often by a significant margin, and it is demonstrated that if audio is available, then visual information helps to improve speech recognition performance.

...read moreread less

Abstract: The goal of this work is to recognise phrases and sentences being spoken by a talking face, with or without the audio. Unlike previous works that have focussed on recognising a limited number of words or phrases, we tackle lip reading as an open-world problem – unconstrained natural language sentences, and in the wild videos. Our key contributions are: (1) a Watch, Listen, Attend and Spell (WLAS) network that learns to transcribe videos of mouth motion to characters, (2) a curriculum learning strategy to accelerate training and to reduce overfitting, (3) a Lip Reading Sentences (LRS) dataset for visual speech recognition, consisting of over 100,000 natural sentences from British television. The WLAS model trained on the LRS dataset surpasses the performance of all previous work on standard lip reading benchmark datasets, often by a significant margin. This lip reading performance beats a professional lip reader on videos from BBC television, and we also demonstrate that if audio is available, then visual information helps to improve speech recognition performance.

...read moreread less

638 citations

Journal Article•DOI•

Prediction Errors of Molecular Machine Learning Models Lower than Hybrid DFT Error

[...]

Felix A. Faber¹, Luke Hutchison², Bing Huang¹, Justin Gilmer², Samuel S. Schoenholz², George E. Dahl², Oriol Vinyals², Steven Kearnes², Patrick Riley², O. Anatole von Lilienfeld¹ - Show less +6 more•Institutions (2)

University of Basel¹, Google²

10 Oct 2017-Journal of Chemical Theory and Computation

TL;DR: Numerical evidence that ML model predictions deviate from DFT less than DFT (B3LYP) deviates from experiment for all properties is presented and suggests that ML models could be more accurate than hybrid DFT if explicitly electron correlated quantum (or experimental) data were available.

...read moreread less

Abstract: We investigate the impact of choosing regressors and molecular representations for the construction of fast machine learning (ML) models of 13 electronic ground-state properties of organic molecules. The performance of each regressor/representation/property combination is assessed using learning curves which report out-of-sample errors as a function of training set size with up to ∼118k distinct molecules. Molecular structures and properties at the hybrid density functional theory (DFT) level of theory come from the QM9 database [Ramakrishnan et al. Sci. Data 2014, 1, 140022] and include enthalpies and free energies of atomization, HOMO/LUMO energies and gap, dipole moment, polarizability, zero point vibrational energy, heat capacity, and the highest fundamental vibrational frequency. Various molecular representations have been studied (Coulomb matrix, bag of bonds, BAML and ECFP4, molecular graphs (MG)), as well as newly developed distribution based variants including histograms of distances (HD), angles...

...read moreread less

542 citations

Proceedings Article•

Imagination-Augmented Agents for Deep Reinforcement Learning

[...]

Sébastien Racanière¹, Theophane Weber¹, David P. Reichert¹, Lars Buesing², Arthur Guez¹, Danilo Jimenez Rezende¹, Adrià Puigdomènech Badia¹, Oriol Vinyals¹, Nicolas Heess¹, Yujia Li³, Razvan Pascanu¹, Peter W. Battaglia⁴, Demis Hassabis¹, David Silver¹, Daan Wierstra¹ - Show less +11 more•Institutions (4)

Google¹, Columbia University², University of Toronto³, Massachusetts Institute of Technology⁴

19 Jul 2017

TL;DR: In this article, Imagination-Augmented Agents (I2As) learn to interpret predictions from a trained environment model to construct implicit plans in arbitrary ways, by using the predictions as additional context in deep policy networks.

...read moreread less

Abstract: We introduce Imagination-Augmented Agents (I2As), a novel architecture for deep reinforcement learning combining model-free and model-based aspects. In contrast to most existing model-based reinforcement learning and planning methods, which prescribe how a model should be used to arrive at a policy, I2As learn to interpret predictions from a trained environment model to construct implicit plans in arbitrary ways, by using the predictions as additional context in deep policy networks. I2As show improved data efficiency, performance, and robustness to model misspecification compared to several strong baselines.

...read moreread less

361 citations

Proceedings Article•

Parallel WaveNet: Fast High-Fidelity Speech Synthesis

[...]

Aaron van den Oord¹, Yazhe Li¹, Igor Babuschkin², Karen Simonyan¹, Oriol Vinyals¹, Koray Kavukcuoglu¹, George van den Driessche¹, Edward Lockhart¹, Luis C. Cobo³, Florian Stimberg⁴, Norman Casagrande, Dominik Grewe¹, Seb Noury⁵, Sander Dieleman¹, Erich Elsen¹, Nal Kalchbrenner¹, Heiga Zen¹, Alex Graves¹, Helen King¹, Thomas C. Walters¹, Dan Belov¹, Demis Hassabis¹ - Show less +18 more•Institutions (5)

Google¹, University of Manchester², Georgia Institute of Technology³, Technical University of Berlin⁴, Palantir Technologies⁵

28 Nov 2017

TL;DR: The authors introduced Probability Density Distillation, a new method for training a parallel feed-forward network from a trained WaveNet with no significant difference in quality, which is capable of generating high-fidelity speech samples at more than 20 times faster than real-time, and is deployed online by Google Assistant.

...read moreread less

Abstract: The recently-developed WaveNet architecture is the current state of the art in realistic speech synthesis, consistently rated as more natural sounding for many different languages than any previous system. However, because WaveNet relies on sequential generation of one audio sample at a time, it is poorly suited to today's massively parallel computers, and therefore hard to deploy in a real-time production setting. This paper introduces Probability Density Distillation, a new method for training a parallel feed-forward network from a trained WaveNet with no significant difference in quality. The resulting system is capable of generating high-fidelity speech samples at more than 20 times faster than real-time, and is deployed online by Google Assistant, including serving multiple English and Japanese voices.

...read moreread less

323 citations

Posted Content•

Hierarchical Representations for Efficient Architecture Search

[...]

Hanxiao Liu¹, Karen Simonyan², Oriol Vinyals², Chrisantha Fernando², Koray Kavukcuoglu² - Show less +1 more•Institutions (2)

Carnegie Mellon University¹, Google²

01 Nov 2017-arXiv: Learning

TL;DR: This work efficiently discovers architectures that outperform a large number of manually designed models for image classification, obtaining top-1 error of 3.6% on CIFAR-10 and 20.3% when transferred to ImageNet, which is competitive with the best existing neural architecture search approaches.

...read moreread less

Abstract: We explore efficient neural architecture search methods and show that a simple yet powerful evolutionary algorithm can discover new architectures with excellent performance. Our approach combines a novel hierarchical genetic representation scheme that imitates the modularized design pattern commonly adopted by human experts, and an expressive search space that supports complex topologies. Our algorithm efficiently discovers architectures that outperform a large number of manually designed models for image classification, obtaining top-1 error of 3.6% on CIFAR-10 and 20.3% when transferred to ImageNet, which is competitive with the best existing neural architecture search approaches. We also present results using random search, achieving 0.3% less top-1 accuracy on CIFAR-10 and 0.1% less on ImageNet whilst reducing the search time from 36 hours down to 1 hour.

...read moreread less

Proceedings Article•

Neural Episodic Control

[...]

Alexander Pritzel, Benigno Uria, Sriram Srinivasan, Adrià Puigdomènech Badia, Oriol Vinyals, Demis Hassabis, Daan Wierstra, Charles Blundell - Show less +4 more

06 Aug 2017

TL;DR: This work proposes Neural Episodic Control: a deep reinforcement learning agent that is able to rapidly assimilate new experiences and act upon them, and shows across a wide range of environments that the agent learns significantly faster than other state-of-the-art, general purpose deep reinforcementlearning agents.

...read moreread less

Abstract: Deep reinforcement learning methods attain super-human performance in a wide range of environments. Such methods are grossly inefficient, often taking orders of magnitudes more data than humans to achieve reasonable performance. We propose Neural Episodic Control: a deep reinforcement learning agent that is able to rapidly assimilate new experiences and act upon them. Our agent uses a semi-tabular representation of the value function: a buffer of past experience containing slowly changing state representations and rapidly updated estimates of the value function. We show across a wide range of environments that our agent learns significantly faster than other state-of-the-art, general purpose deep reinforcement learning agents.

...read moreread less

Proceedings Article•

Video Pixel Networks

[...]

Nal Kalchbrenner¹, Aaron van den Oord¹, Karen Simonyan¹, Ivo Danihelka¹, Oriol Vinyals¹, Alex Graves¹, Koray Kavukcuoglu¹ - Show less +3 more•Institutions (1)

Google¹

17 Jul 2017

TL;DR: The Video Pixel Network (VPN) as discussed by the authors estimates the discrete joint distribution of the raw pixel values in a video by encoding the time, space and color structure of video tensors and encode it as a four-dimensional dependency chain.

...read moreread less

Abstract: We propose a probabilistic video model, the Video Pixel Network (VPN), that estimates the discrete joint distribution of the raw pixel values in a video. The model and the neural architecture reflect the time, space and color structure of video tensors and encode it as a four-dimensional dependency chain. The VPN approaches the best possible performance on the Moving MNIST benchmark, a leap over the previous state of the art, and the generated videos show only minor deviations from the ground truth. The VPN also produces detailed samples on the action-conditional Robotic Pushing benchmark and generalizes to the motion of novel objects.

...read moreread less

Proceedings Article•

Decoupled neural interfaces using synthetic gradients

[...]

Max Jaderberg, Wojciech Marian Czarnecki, Simon Osindero, Oriol Vinyals, Alex Graves, David Silver, Koray Kavukcuoglu - Show less +3 more

06 Aug 2017

TL;DR: It is demonstrated that in addition to predicting gradients, the same framework can be used to predict inputs, resulting in models which are decoupled in both the forward and backwards pass -- amounting to independent networks which co-learn such that they can be composed into a single functioning corporation.

...read moreread less

Abstract: Training directed neural networks typically requires forward-propagating data through a computation graph, followed by backpropagating error signal, to produce weight updates. All layers, or more generally, modules, of the network are therefore locked, in the sense that they must wait for the remainder of the network to execute forwards and propagate error backwards before they can be updated. In this work we break this constraint by decoupling modules by introducing a model of the future computation of the network graph. These models predict what the result of the modelled subgraph will produce using only local information. In particular we focus on modelling error gradients: by using the modelled synthetic gradient in place of true backpropagated error gradients we decouple subgraphs, and can update them independently and asynchronously i.e. we realise decoupled neural interfaces. We show results for feed-forward models, where every layer is trained asynchronously, recurrent neural networks (RNNs) where predicting one's future gradient extends the time over which the RNN can effectively model, and also a hierarchical RNN system with ticking at different timescales. Finally, we demonstrate that in addition to predicting gradients, the same framework can be used to predict inputs, resulting in models which are decoupled in both the forward and backwards pass - amounting to independent networks which co-learn such that they can be composed into a single functioning corporation.

...read moreread less

Posted Content•

Parallel WaveNet: Fast High-Fidelity Speech Synthesis

[...]

Aaron van den Oord¹, Yazhe Li¹, Igor Babuschkin², Karen Simonyan¹, Oriol Vinyals¹, Koray Kavukcuoglu¹, George van den Driessche¹, Edward Lockhart¹, Luis C. Cobo³, Florian Stimberg⁴, Norman Casagrande, Dominik Grewe¹, Seb Noury⁵, Sander Dieleman¹, Erich Elsen¹, Nal Kalchbrenner¹, Heiga Zen¹, Alex Graves¹, Helen King¹, Thomas C. Walters¹, Dan Belov¹, Demis Hassabis¹ - Show less +18 more•Institutions (5)

Google¹, University of Manchester², Georgia Institute of Technology³, Technical University of Berlin⁴, Palantir Technologies⁵

28 Nov 2017-arXiv: Learning

TL;DR: This paper introduced Probability Density Distillation, a new method for training a parallel feed-forward network from a trained WaveNet with no significant difference in quality, which is capable of generating high-fidelity speech samples at more than 20 times faster than real-time, and is deployed online by Google Assistant.

...read moreread less

Abstract: The recently-developed WaveNet architecture is the current state of the art in realistic speech synthesis, consistently rated as more natural sounding for many different languages than any previous system. However, because WaveNet relies on sequential generation of one audio sample at a time, it is poorly suited to today's massively parallel computers, and therefore hard to deploy in a real-time production setting. This paper introduces Probability Density Distillation, a new method for training a parallel feed-forward network from a trained WaveNet with no significant difference in quality. The resulting system is capable of generating high-fidelity speech samples at more than 20 times faster than real-time, and is deployed online by Google Assistant, including serving multiple English and Japanese voices.

...read moreread less

Posted Content•

Imagination-Augmented Agents for Deep Reinforcement Learning

[...]

Theophane Weber, Sébastien Racanière, David P. Reichert, Lars Buesing, Arthur Guez, Danilo Jimenez Rezende, Adrià Puigdomènech Badia, Oriol Vinyals, Nicolas Heess, Yujia Li, Razvan Pascanu, Peter W. Battaglia, David Silver, Daan Wierstra - Show less +10 more

19 Jul 2017-arXiv: Learning

TL;DR: Imagination-Augmented Agents (I2As), a novel architecture for deep reinforcement learning combining model-free and model-based aspects, shows improved data efficiency, performance, and robustness to model misspecification compared to several baselines.

...read moreread less

Abstract: We introduce Imagination-Augmented Agents (I2As), a novel architecture for deep reinforcement learning combining model-free and model-based aspects. In contrast to most existing model-based reinforcement learning and planning methods, which prescribe how a model should be used to arrive at a policy, I2As learn to interpret predictions from a learned environment model to construct implicit plans in arbitrary ways, by using the predictions as additional context in deep policy networks. I2As show improved data efficiency, performance, and robustness to model misspecification compared to several baselines.

...read moreread less

Posted Content•

Learning model-based planning from scratch

[...]

Razvan Pascanu, Yujia Li, Oriol Vinyals, Nicolas Heess, Lars Buesing, Sébastien Racanière, David P. Reichert, Theophane Weber, Daan Wierstra, Peter W. Battaglia - Show less +6 more

19 Jul 2017-arXiv: Artificial Intelligence

TL;DR: The "Imagination-based Planner" is introduced, the first model-based, sequential decision-making agent that can learn to construct, evaluate, and execute plans, and also learn elaborate planning strategies in a discrete maze-solving task.

...read moreread less

Abstract: Conventional wisdom holds that model-based planning is a powerful approach to sequential decision-making. It is often very challenging in practice, however, because while a model can be used to evaluate a plan, it does not prescribe how to construct a plan. Here we introduce the "Imagination-based Planner", the first model-based, sequential decision-making agent that can learn to construct, evaluate, and execute plans. Before any action, it can perform a variable number of imagination steps, which involve proposing an imagined action and evaluating it with its model-based imagination. All imagined actions and outcomes are aggregated, iteratively, into a "plan context" which conditions future real and imagined actions. The agent can even decide how to imagine: testing out alternative imagined actions, chaining sequences of actions together, or building a more complex "imagination tree" by navigating flexibly among the previously imagined states using a learned policy. And our agent can learn to plan economically, jointly optimizing for external rewards and computational costs associated with using its imagination. We show that our architecture can learn to solve a challenging continuous control problem, and also learn elaborate planning strategies in a discrete maze-solving task. Our work opens a new direction toward learning the components of a model-based planning system and how to use them.

...read moreread less

Posted Content•

Neural Episodic Control

[...]

Alexander Pritzel, Benigno Uria, Sriram Srinivasan, Adrià Puigdomènech, Oriol Vinyals, Demis Hassabis, Daan Wierstra, Charles Blundell - Show less +4 more

06 Mar 2017-arXiv: Learning

TL;DR: Neural Episodic Control as discussed by the authors uses a semi-tabular representation of the value function: a buffer of past experience containing slowly changing state representations and rapidly updated estimates of value function.

...read moreread less

Abstract: Deep reinforcement learning methods attain super-human performance in a wide range of environments. Such methods are grossly inefficient, often taking orders of magnitudes more data than humans to achieve reasonable performance. We propose Neural Episodic Control: a deep reinforcement learning agent that is able to rapidly assimilate new experiences and act upon them. Our agent uses a semi-tabular representation of the value function: a buffer of past experience containing slowly changing state representations and rapidly updated estimates of the value function. We show across a wide range of environments that our agent learns significantly faster than other state-of-the-art, general purpose deep reinforcement learning agents.

...read moreread less

Posted Content•

Few-shot Autoregressive Density Estimation: Towards Learning to Learn Distributions

[...]

Scott Reed¹, Yutian Chen¹, Tom Le Paine², Aaron van den Oord³, S. M. Ali Eslami¹, Danilo Jimenez Rezende¹, Oriol Vinyals¹, Nando de Freitas⁴ - Show less +4 more•Institutions (4)

Google¹, University of Illinois at Urbana–Champaign², Ghent University³, University of Oxford⁴

27 Oct 2017-arXiv: Neural and Evolutionary Computing

TL;DR: In this article, a few-shot image density estimation model was proposed to learn visual concepts from only a handful of examples. But the model requires many thousands of gradient-based weight updates and unique image examples for training.

...read moreread less

Abstract: Deep autoregressive models have shown state-of-the-art performance in density estimation for natural images on large-scale datasets such as ImageNet. However, such models require many thousands of gradient-based weight updates and unique image examples for training. Ideally, the models would rapidly learn visual concepts from only a handful of examples, similar to the manner in which humans learns across many vision tasks. In this paper, we show how 1) neural attention and 2) meta learning techniques can be used in combination with autoregressive models to enable effective few-shot density estimation. Our proposed modifications to PixelCNN result in state-of-the art few-shot density estimation on the Omniglot dataset. Furthermore, we visualize the learned attention policy and find that it learns intuitive algorithms for simple tasks such as image mirroring on ImageNet and handwriting on Omniglot without supervision. Finally, we extend the model to natural images and demonstrate few-shot image generation on the Stanford Online Products dataset.

...read moreread less

Patent•

Processing sequences using convolutional neural networks

[...]

Aaron van den Oord, Sander Dieleman, Nal Kalchbrenner, Karen Simonyan, Oriol Vinyals, Lasse Espeholt - Show less +2 more

06 Sep 2017

TL;DR: In this paper, a method for generating neural network output from an input sequence is proposed, where the output subnetwork is configured to receive the alternative representations and to process the alternative representation to generate the neural network outputs.

...read moreread less

Abstract: To provide a method, a system, and an apparatus including computer programs encoded on computer storage media, for generating neural network output from an input sequence.SOLUTION: One of methods includes: for each input, providing a current input sequence that comprises an input and inputs preceding the input in the input sequence to a convolutional subnetwork comprising a plurality of convolutional neural network layers, where the convolutional subnetwork is configured to, for each of the plurality of inputs, receive the current input sequence relating to the input, and process the current input sequence to generate an alternative representation relating to the input; and providing the alternative representations to an output subnetwork, where the output subnetwork is configured to receive the alternative representations and to process the alternative representations to generate the neural network output.SELECTED DRAWING: Figure 1

...read moreread less

Posted Content•

Metacontrol for Adaptive Imagination-Based Optimization

[...]

Jessica B. Hamrick¹, Andrew J. Ballard², Razvan Pascanu³, Oriol Vinyals³, Nicolas Heess³, Peter W. Battaglia³ - Show less +2 more•Institutions (3)

University of California, Berkeley¹, University of Cambridge², Google³

07 May 2017-arXiv: Learning

TL;DR: In this paper, the authors introduce a metacontroller which learns to optimize a sequence of internal simulations over predictive models of the world in order to construct a more informed, and more economical, solution.

...read moreread less

Abstract: Many machine learning systems are built to solve the hardest examples of a particular task, which often makes them large and expensive to run---especially with respect to the easier examples, which might require much less computation. For an agent with a limited computational budget, this "one-size-fits-all" approach may result in the agent wasting valuable computation on easy examples, while not spending enough on hard examples. Rather than learning a single, fixed policy for solving all instances of a task, we introduce a metacontroller which learns to optimize a sequence of "imagined" internal simulations over predictive models of the world in order to construct a more informed, and more economical, solution. The metacontroller component is a model-free reinforcement learning agent, which decides both how many iterations of the optimization procedure to run, as well as which model to consult on each iteration. The models (which we call "experts") can be state transition models, action-value functions, or any other mechanism that provides information useful for solving the task, and can be learned on-policy or off-policy in parallel with the metacontroller. When the metacontroller, controller, and experts were trained with "interaction networks" (Battaglia et al., 2016) as expert models, our approach was able to solve a challenging decision-making problem under complex non-linear dynamics. The metacontroller learned to adapt the amount of computation it performed to the difficulty of the task, and learned how to choose which experts to consult by factoring in both their reliability and individual computational resource costs. This allowed the metacontroller to achieve a lower overall cost (task loss plus computational cost) than more traditional fixed policy approaches. These results demonstrate that our approach is a powerful framework for using...

...read moreread less

Proceedings Article•

Metacontrol for Adaptive Imagination-Based Optimization

[...]

Jessica B. Hamrick¹, Andrew J. Ballard², Razvan Pascanu³, Oriol Vinyals³, Nicolas Heess³, Peter W. Battaglia³ - Show less +2 more•Institutions (3)

University of California, Berkeley¹, University of Cambridge², Google³

01 May 2017

TL;DR: This work introduces a metacontroller which learns to optimize a sequence of "imagined" internal simulations over predictive models of the world in order to construct a more informed, and more economical, solution.

...read moreread less

Abstract: Many machine learning systems are built to solve the hardest examples of a particular task, which often makes them large and expensive to run---especially with respect to the easier examples, which might require much less computation. For an agent with a limited computational budget, this "one-size-fits-all" approach may result in the agent wasting valuable computation on easy examples, while not spending enough on hard examples. Rather than learning a single, fixed policy for solving all instances of a task, we introduce a metacontroller which learns to optimize a sequence of "imagined" internal simulations over predictive models of the world in order to construct a more informed, and more economical, solution. The metacontroller component is a model-free reinforcement learning agent, which decides both how many iterations of the optimization procedure to run, as well as which model to consult on each iteration. The models (which we call "experts") can be state transition models, action-value functions, or any other mechanism that provides information useful for solving the task, and can be learned on-policy or off-policy in parallel with the metacontroller. When the metacontroller, controller, and experts were trained with "interaction networks" (Battaglia et al., 2016) as expert models, our approach was able to solve a challenging decision-making problem under complex non-linear dynamics. The metacontroller learned to adapt the amount of computation it performed to the difficulty of the task, and learned how to choose which experts to consult by factoring in both their reliability and individual computational resource costs. This allowed the metacontroller to achieve a lower overall cost (task loss plus computational cost) than more traditional fixed policy approaches. These results demonstrate that our approach is a powerful framework for using...

...read moreread less

Posted Content•

Understanding Synthetic Gradients and Decoupled Neural Interfaces

[...]

Wojciech Marian Czarnecki¹, Grzegorz Świrszcz, Max Jaderberg¹, Simon Osindero², Oriol Vinyals¹, Koray Kavukcuoglu¹ - Show less +2 more•Institutions (2)

Google¹, Yahoo!²

01 Mar 2017-arXiv: Learning

TL;DR: In this article, the authors investigate the mechanism by which synthetic gradient estimators approximate the true loss, and how that leads to drastically different layer-wise representations, and expose the relationship of using synthetic gradients to other error approximation techniques and find a unifying language for discussion and comparison.

...read moreread less

Abstract: When training neural networks, the use of Synthetic Gradients (SG) allows layers or modules to be trained without update locking - without waiting for a true error gradient to be backpropagated - resulting in Decoupled Neural Interfaces (DNIs). This unlocked ability of being able to update parts of a neural network asynchronously and with only local information was demonstrated to work empirically in Jaderberg et al (2016). However, there has been very little demonstration of what changes DNIs and SGs impose from a functional, representational, and learning dynamics point of view. In this paper, we study DNIs through the use of synthetic gradients on feed-forward networks to better understand their behaviour and elucidate their effect on optimisation. We show that the incorporation of SGs does not affect the representational strength of the learning system for a neural network, and prove the convergence of the learning system for linear and deep linear models. On practical problems we investigate the mechanism by which synthetic gradient estimators approximate the true loss, and, surprisingly, how that leads to drastically different layer-wise representations. Finally, we also expose the relationship of using synthetic gradients to other error approximation techniques and find a unifying language for discussion and comparison.

...read moreread less

Proceedings Article•

Understanding Synthetic Gradients and Decoupled Neural Interfaces

[...]

Wojciech Marian Czarnecki¹, Grzegorz Świrszcz, Max Jaderberg¹, Simon Osindero², Oriol Vinyals¹, Koray Kavukcuoglu¹ - Show less +2 more•Institutions (2)

Google¹, Yahoo!²

17 Jul 2017

TL;DR: In this paper, the authors investigate the mechanism by which synthetic gradient estimators approximate the true loss, and how that leads to drastically different layer-wise representations, and expose the relationship of using synthetic gradients to other error approximation techniques and find a unifying language for discussion and comparison.

...read moreread less

Abstract: When training neural networks, the use of Synthetic Gradients (SG) allows layers or modules to be trained without update locking - without waiting for a true error gradient to be backpropagated - resulting in Decoupled Neural Interfaces (DNIs). This unlocked ability of being able to update parts of a neural network asynchronously and with only local information was demonstrated to work empirically in Jaderberg et al (2016). However, there has been very little demonstration of what changes DNIs and SGs impose from a functional, representational, and learning dynamics point of view. In this paper, we study DNIs through the use of synthetic gradients on feed-forward networks to better understand their behaviour and elucidate their effect on optimisation. We show that the incorporation of SGs does not affect the representational strength of the learning system for a neural network, and prove the convergence of the learning system for linear and deep linear models. On practical problems we investigate the mechanism by which synthetic gradient estimators approximate the true loss, and, surprisingly, how that leads to drastically different layer-wise representations. Finally, we also expose the relationship of using synthetic gradients to other error approximation techniques and find a unifying language for discussion and comparison.

...read moreread less

Posted Content•

Fast machine learning models of electronic and energetic properties consistently reach approximation errors better than DFT accuracy

[...]

Felix A. Faber, Luke Hutchinson, Huang Bing, Justin Gilmer, Samuel S. Schoenholz, George E. Dahl, Oriol Vinyals, Steven Kearnes, Patrick Riley, Anatole von Lilienfeld - Show less +6 more

17 Feb 2017

TL;DR: Numerical evidence is presented that ML model predictions for all properties can reach an approximation error to DFT which is on par with chemical accuracy.

...read moreread less

Abstract: We investigate the impact of choosing regressors and molecular representations for the construction of fast machine learning (ML) models of thirteen electronic ground-state properties of organic molecules. The performance of each regressor/representation/property combination is assessed using learning curves which report out-of-sample errors as a function of training set size with up to $\sim$117k distinct molecules. Molecular structures and properties at hybrid density functional theory (DFT) level of theory used for training and testing come from the QM9 database [Ramakrishnan et al, {\em Scientific Data} {\bf 1} 140022 (2014)] and include dipole moment, polarizability, HOMO/LUMO energies and gap, electronic spatial extent, zero point vibrational energy, enthalpies and free energies of atomization, heat capacity and the highest fundamental vibrational frequency. Various representations from the literature have been studied (Coulomb matrix, bag of bonds, BAML and ECFP4, molecular graphs (MG)), as well as newly developed distribution based variants including histograms of distances (HD), and angles (HDA/MARAD), and dihedrals (HDAD). Regressors include linear models (Bayesian ridge regression (BR) and linear regression with elastic net regularization (EN)), random forest (RF), kernel ridge regression (KRR) and two types of neural net works, graph convolutions (GC) and gated graph networks (GG). We present numerical evidence that ML model predictions deviate from DFT less than DFT deviates from experiment for all properties. Furthermore, our out-of-sample prediction errors with respect to hybrid DFT reference are on par with, or close to, chemical accuracy. Our findings suggest that ML models could be more accurate than hybrid DFT if explicitly electron correlated quantum (or experimental) data was available.

...read moreread less

Patent•

Generating audio using neural networks

[...]

Aaron van den Oord, Sander Dieleman, Nal Kalchbrenner, Karen Simonyan, Oriol Vinyals - Show less +1 more

06 Sep 2017

TL;DR: In this article, a method of training a neural network system to generate an output sequence of audio data that comprises a respective audio sample at each of a plurality of time steps is presented.

...read moreread less

Abstract: A method of training a neural network system to generate an output sequence of audio data that comprises a respective audio sample at each of a plurality of time steps. One of the methods includes processing a training sequence of audio data using a convolutional subnetwork. The convolutional subnetwork is configured to, for each of the time steps, receive a current sequence of audio data that comprises an audio sample at each time step that precedes the time step in the training sequence, and process the current sequence of audio data to generate an alternative representation for the time step. The alternative representation for the time step is used to generate an output that defines a score distribution over a plurality of possible audio samples for the time step. The neural network system is trained using supervised learning based on the input audio samples and the output score distribution.

...read moreread less

Patent•

Sentence compression using recurrent neural networks

[...]

Ekaterina Filippova¹, Enrique Alfonseca¹, Carlos Alberto Colmenares Rojas¹, Lukasz Kaiser¹, Oriol Vinyals¹ - Show less +1 more•Institutions (1)

Google¹

03 Feb 2017

TL;DR: In this paper, the authors propose a method for generating a sentence summary by tokenizing the sentence into a plurality of tokens, processing data representative of each token in a first order using an LSTM neural network to initialize an internal state of a second LSTMs neural network, and generating the summarized version of the sentence using the outputs of the second lstMs for the tokens.

...read moreread less

Abstract: Methods, systems, apparatus, including computer programs encoded on computer storage medium, for generating a sentence summary. In one aspect, the method includes actions of tokenizing the sentence into a plurality of tokens, processing data representative of each token in a first order using an LSTM neural network to initialize an internal state of a second LSTM neural network, processing data representative of each token in a second order using the second LSTM neural network, comprising, for each token in the sentence: processing the data representative of the token using the second LSTM neural network in accordance with a current internal state of the second LSTM neural network to (i) generate an LSTM output for the token, and (ii) to update the current internal state of the second LSTM neural network, and generating the summarized version of the sentence using the outputs of the second LSTM neural network for the tokens.

...read moreread less

Posted Content•

Neural Discrete Representation Learning.

[...]

Aaron van den Oord¹, Oriol Vinyals¹, Koray Kavukcuoglu¹•Institutions (1)

Google¹

02 Nov 2017-arXiv: Learning

TL;DR: The Vector Quantised-Variational AutoEncoder (VQ-VAE) as discussed by the authors is a generative model that learns a discrete latent representation by using vector quantization.

...read moreread less

Abstract: Learning useful representations without supervision remains a key challenge in machine learning. In this paper, we propose a simple yet powerful generative model that learns such discrete representations. Our model, the Vector Quantised-Variational AutoEncoder (VQ-VAE), differs from VAEs in two key ways: the encoder network outputs discrete, rather than continuous, codes; and the prior is learnt rather than static. In order to learn a discrete latent representation, we incorporate ideas from vector quantisation (VQ). Using the VQ method allows the model to circumvent issues of "posterior collapse" -- where the latents are ignored when they are paired with a powerful autoregressive decoder -- typically observed in the VAE framework. Pairing these representations with an autoregressive prior, the model can generate high quality images, videos, and speech as well as doing high quality speaker conversion and unsupervised learning of phonemes, providing further evidence of the utility of the learnt representations.

...read moreread less

Showing papers by "Oriol Vinyals published in 2017"