Top 9 papers published by Scott Reed from Google in 2018

Proceedings Article•DOI•

ScanComplete: Large-Scale Scene Completion and Semantic Segmentation for 3D Scans

[...]

Angela Dai¹, Daniel Ritchie², Martin Bokeloh³, Scott Reed, Jürgen Sturm³, Matthias NieBner¹ - Show less +2 more•Institutions (3)

Technische Universität München¹, Brown University², Google³

18 Jun 2018

TL;DR: This work introduces ScanComplete, a novel data-driven approach for taking an incomplete 3D scan of a scene as input and predicting a complete 3D model along with per-voxel semantic labels, and devise a fully-convolutional generative 3D CNN model whose filter kernels are invariant to the overall scene size.

...read moreread less

Abstract: We introduce ScanComplete, a novel data-driven approach for taking an incomplete 3D scan of a scene as input and predicting a complete 3D model along with per-voxel semantic labels. The key contribution of our method is its ability to handle large scenes with varying spatial extent, managing the cubic growth in data size as scene size increases. To this end, we devise a fully-convolutional generative 3D CNN model whose filter kernels are invariant to the overall scene size. The model can be trained on scene subvolumes but deployed on arbitrarily large scenes at test time. In addition, we propose a coarse-to-fine inference strategy in order to produce high-resolution output while also leveraging large input context sizes. In an extensive series of experiments, we carefully evaluate different model design choices, considering both deterministic and probabilistic models for completion and semantic inference. Our results show that we outperform other methods not only in the size of the environments handled and processing efficiency, but also with regard to completion quality and semantic segmentation performance by a significant margin.

...read moreread less

265 citations

Proceedings Article•

Neural arithmetic logic units

[...]

Andrew Trask¹, Felix Hill², Scott Reed², Jack W. Rae², Chris Dyer³, Phil Blunsom² - Show less +2 more•Institutions (3)

University of Oxford¹, Google², Carnegie Mellon University³

03 Dec 2018

TL;DR: In this article, a neural arithmetic logic unit (NALU) is proposed to learn to track time, perform arithmetic over images of numbers, translate numerical language into real-valued scalars, execute computer code, and count objects in images.

...read moreread less

Abstract: Neural networks can learn to represent and manipulate numerical information, but they seldom generalize well outside of the range of numerical values encountered during training. To encourage more systematic numerical extrapolation, we propose an architecture that represents numerical quantities as linear activations which are manipulated using primitive arithmetic operators, controlled by learned gates. We call this module a neural arithmetic logic unit (NALU), by analogy to the arithmetic logic unit in traditional processors. Experiments show that NALU-enhanced neural networks can learn to track time, perform arithmetic over images of numbers, translate numerical language into real-valued scalars, execute computer code, and count objects in images. In contrast to conventional architectures, we obtain substantially better generalization both inside and outside of the range of numerical values encountered during training, often extrapolating orders of magnitude beyond trained numerical ranges.

...read moreread less

140 citations

Proceedings Article•

Sample-efficient adaptive text-to-speech

[...]

Yutian Chen, Yannis M. Assael, Brendan Shillingford, David Budden, Scott Reed, Heiga Zen, Quan Wang, Luis C. Cobo, Andrew Trask¹, Ben Laurie², Caglar Gulcehre³, Aaron van den Oord², Oriol Vinyals, Nando de Freitas¹ - Show less +10 more•Institutions (3)

University of Oxford¹, Google², Université de Montréal³

27 Sep 2018

TL;DR: In this article, a meta-learning approach for adaptive text-to-speech (TTS) with few data is presented, where the aim is to produce a network that requires few data at deployment time to rapidly adapt to new speakers.

...read moreread less

Abstract: We present a meta-learning approach for adaptive text-to-speech (TTS) with few data. During training, we learn a multi-speaker model using a shared conditional WaveNet core and independent learned embeddings for each speaker. The aim of training is not to produce a neural network with fixed weights, which is then deployed as a TTS system. Instead, the aim is to produce a network that requires few data at deployment time to rapidly adapt to new speakers. We introduce and benchmark three strategies: (i) learning the speaker embedding while keeping the WaveNet core fixed, (ii) fine-tuning the entire architecture with stochastic gradient descent, and (iii) predicting the speaker embedding with a trained neural network encoder. The experiments show that these approaches are successful at adapting the multi-speaker neural network to new speakers, obtaining state-of-the-art results in both sample naturalness and voice similarity with merely a few minutes of audio data from new speakers.

...read moreread less

87 citations

Posted Content•

Neural Arithmetic Logic Units

[...]

Andrew Trask¹, Felix Hill², Scott Reed², Jack W. Rae², Chris Dyer³, Phil Blunsom² - Show less +2 more•Institutions (3)

University of Oxford¹, Google², Carnegie Mellon University³

01 Aug 2018-arXiv: Neural and Evolutionary Computing

TL;DR: Experiments show that NALU-enhanced neural networks can learn to track time, perform arithmetic over images of numbers, translate numerical language into real-valued scalars, execute computer code, and count objects in images.

...read moreread less

Abstract: Neural networks can learn to represent and manipulate numerical information, but they seldom generalize well outside of the range of numerical values encountered during training. To encourage more systematic numerical extrapolation, we propose an architecture that represents numerical quantities as linear activations which are manipulated using primitive arithmetic operators, controlled by learned gates. We call this module a neural arithmetic logic unit (NALU), by analogy to the arithmetic logic unit in traditional processors. Experiments show that NALU-enhanced neural networks can learn to track time, perform arithmetic over images of numbers, translate numerical language into real-valued scalars, execute computer code, and count objects in images. In contrast to conventional architectures, we obtain substantially better generalization both inside and outside of the range of numerical values encountered during training, often extrapolating orders of magnitude beyond trained numerical ranges.

...read moreread less

74 citations

Posted Content•

Sample Efficient Adaptive Text-to-Speech

[...]

Yutian Chen¹, Yannis M. Assael, Brendan Shillingford, David Budden, Scott Reed, Heiga Zen, Quan Wang, Luis C. Cobo, Andrew Trask², Ben Laurie¹, Caglar Gulcehre³, Aaron van den Oord¹, Oriol Vinyals, Nando de Freitas² - Show less +10 more•Institutions (3)

Google¹, University of Oxford², Université de Montréal³

27 Sep 2018-arXiv: Learning

TL;DR: Three strategies are introduced and benchmark three strategies at adapting the multi-speaker neural network to new speakers, obtaining state-of-the-art results in both sample naturalness and voice similarity with merely a few minutes of audio data from new speakers.

...read moreread less

Abstract: We present a meta-learning approach for adaptive text-to-speech (TTS) with few data. During training, we learn a multi-speaker model using a shared conditional WaveNet core and independent learned embeddings for each speaker. The aim of training is not to produce a neural network with fixed weights, which is then deployed as a TTS system. Instead, the aim is to produce a network that requires few data at deployment time to rapidly adapt to new speakers. We introduce and benchmark three strategies: (i) learning the speaker embedding while keeping the WaveNet core fixed, (ii) fine-tuning the entire architecture with stochastic gradient descent, and (iii) predicting the speaker embedding with a trained neural network encoder. The experiments show that these approaches are successful at adapting the multi-speaker neural network to new speakers, obtaining state-of-the-art results in both sample naturalness and voice similarity with merely a few minutes of audio data from new speakers.

...read moreread less

66 citations

Proceedings Article•

Few-shot Autoregressive Density Estimation: Towards Learning to Learn Distributions

[...]

Scott Reed¹, Yutian Chen¹, Tom Le Paine², Aaron van den Oord³, S. M. Ali Eslami¹, Danilo Jimenez Rezende¹, Oriol Vinyals¹, Nando de Freitas⁴ - Show less +4 more•Institutions (4)

Google¹, University of Illinois at Urbana–Champaign², Ghent University³, University of Oxford⁴

15 Feb 2018

TL;DR: This paper shows how 1) neural attention and 2) meta learning techniques can be used in combination with autoregressive models to enable effective few-shot density estimation on the Omniglot dataset.

...read moreread less

Abstract: Deep autoregressive models have shown state-of-the-art performance in density estimation for natural images on large-scale datasets such as ImageNet. However, such models require many thousands of gradient-based weight updates and unique image examples for training. Ideally, the models would rapidly learn visual concepts from only a handful of examples, similar to the manner in which humans learns across many vision tasks. In this paper, we show how 1) neural attention and 2) meta learning techniques can be used in combination with autoregressive models to enable effective few-shot density estimation. Our proposed modifications to PixelCNN result in state-of-the art few-shot density estimation on the Omniglot dataset. Furthermore, we visualize the learned attention policy and find that it learns intuitive algorithms for simple tasks such as image mirroring on ImageNet and handwriting on Omniglot without supervision. Finally, we extend the model to natural images and demonstrate few-shot image generation on the Stanford Online Products dataset.

...read moreread less

42 citations

Posted Content•

One-Shot High-Fidelity Imitation: Training Large-Scale Deep Nets with RL

[...]

Tom Le Paine, Sergio Gomez Colmenarejo, Ziyu Wang, Scott Reed, Yusuf Aytar, Tobias Pfaff, Matthew W. Hoffman, Gabriel Barth-Maron, Serkan Cabi, David Budden, Nando de Freitas - Show less +7 more

27 Sep 2018-arXiv: Learning

TL;DR: The MetaMimic algorithm is introduced, to the best of the knowledge, the largest existing neural networks for deep RL and shows that larger networks with normalization are needed to achieve one-shot high-fidelity imitation on a challenging manipulation task.

...read moreread less

Abstract: Humans are experts at high-fidelity imitation -- closely mimicking a demonstration, often in one attempt. Humans use this ability to quickly solve a task instance, and to bootstrap learning of new tasks. Achieving these abilities in autonomous agents is an open problem. In this paper, we introduce an off-policy RL algorithm (MetaMimic) to narrow this gap. MetaMimic can learn both (i) policies for high-fidelity one-shot imitation of diverse novel skills, and (ii) policies that enable the agent to solve tasks more efficiently than the demonstrators. MetaMimic relies on the principle of storing all experiences in a memory and replaying these to learn massive deep neural network policies by off-policy RL. This paper introduces, to the best of our knowledge, the largest existing neural networks for deep RL and shows that larger networks with normalization are needed to achieve one-shot high-fidelity imitation on a challenging manipulation task. The results also show that both types of policy can be learned from vision, in spite of the task rewards being sparse, and without access to demonstrator actions.

...read moreread less

27 citations

Visual Imitation with a Minimal Adversary

[...]

Scott Reed, Yusuf Aytar, Ziyu Wang, Tom Le Paine, Aaron van den Oord, Tobias Pfaff, Sergio Gomez, Alexander Novikov, David Budden, Oriol Vinyals - Show less +6 more

27 Sep 2018

3 citations

Patent•

Multiscale image generation

[...]

Nal Kalchbrenner, Dan Belov, Gomez Colmenarejo Sergio, Van Den Oord Aaron Gerard Antonius, Ziyu Wang, Gomes De Freitas Joao Ferdinando, Scott Reed - Show less +3 more

30 Aug 2018

TL;DR: In this article, a method of generating an output image having an output resolution of N pixels x N pixels, each pixel having a respective color value for each of a plurality of color channels, was proposed.

...read moreread less

Abstract: A method of generating an output image having an output resolution of N pixels x N pixels, each pixel in the output image having a respective color value for each of a plurality of color channels, the method comprising: obtaining a low-resolution version of the output image; and upscaling the low-resolution version of the output image to generate the output image having the output resolution by repeatedly performing the following operations: obtaining a current version of the output image having a current K x K resolution; and processing the current version of the output image using a set of convolutional neural networks that are specific to the current resolution to generate an updated version of the output image having a 2K x 2K resolution.

...read moreread less

2 citations

Showing papers by "Scott Reed published in 2018"