Top 11 papers published by Jason Yosinski from Uber in 2018

Proceedings Article•

An intriguing failing of convolutional neural networks and the CoordConv solution

[...]

Rosanne Liu¹, Joel Lehman², Piero Molino¹, Felipe Petroski Such³, Eric Frank, Alex Sergeev, Jason Yosinski⁴ - Show less +3 more•Institutions (4)

Uber ¹, IT University of Copenhagen², Rochester Institute of Technology³, Cornell University⁴

01 Jan 2018

TL;DR: CoordConv as discussed by the authors proposes to give convolution access to its own input coordinates through the use of extra coordinate channels, allowing networks to learn either complete translation invariance or varying degrees of translation dependence, as required by the end task.

...read moreread less

Abstract: Few ideas have enjoyed as large an impact on deep learning as convolution. For any problem involving pixels or spatial representations, common intuition holds that convolutional neural networks may be appropriate. In this paper we show a striking counterexample to this intuition via the seemingly trivial coordinate transform problem, which simply requires learning a mapping between coordinates in (x,y) Cartesian space and coordinates in one-hot pixel space. Although convolutional networks would seem appropriate for this task, we show that they fail spectacularly. We demonstrate and carefully analyze the failure first on a toy problem, at which point a simple fix becomes obvious. We call this solution CoordConv, which works by giving convolution access to its own input coordinates through the use of extra coordinate channels. Without sacrificing the computational and parametric efficiency of ordinary convolution, CoordConv allows networks to learn either complete translation invariance or varying degrees of translation dependence, as required by the end task. CoordConv solves the coordinate transform problem with perfect generalization and 150 times faster with 10--100 times fewer parameters than convolution. This stark contrast raises the question: to what extent has this inability of convolution persisted insidiously inside other tasks, subtly hampering performance from within? A complete answer to this question will require further investigation, but we show preliminary evidence that swapping convolution for CoordConv can improve models on a diverse set of tasks. Using CoordConv in a GAN produced less mode collapse as the transform between high-level spatial latents and pixels becomes easier to learn. A Faster R-CNN detection model trained on MNIST detection showed 24% better IOU when using CoordConv, and in the Reinforcement Learning (RL) domain agents playing Atari games benefit significantly from the use of CoordConv layers.

...read moreread less

444 citations

Posted Content•

An Intriguing Failing of Convolutional Neural Networks and the CoordConv Solution

[...]

Rosanne Liu¹, Joel Lehman², Piero Molino¹, Felipe Petroski Such³, Eric Frank, Alex Sergeev, Jason Yosinski⁴ - Show less +3 more•Institutions (4)

Uber ¹, IT University of Copenhagen², Rochester Institute of Technology³, Cornell University⁴

09 Jul 2018-arXiv: Computer Vision and Pattern Recognition

TL;DR: Preliminary evidence that swapping convolution for CoordConv can improve models on a diverse set of tasks is shown, which works by giving convolution access to its own input coordinates through the use of extra coordinate channels without sacrificing the computational and parametric efficiency of ordinary convolution.

...read moreread less

Abstract: Few ideas have enjoyed as large an impact on deep learning as convolution. For any problem involving pixels or spatial representations, common intuition holds that convolutional neural networks may be appropriate. In this paper we show a striking counterexample to this intuition via the seemingly trivial coordinate transform problem, which simply requires learning a mapping between coordinates in (x,y) Cartesian space and one-hot pixel space. Although convolutional networks would seem appropriate for this task, we show that they fail spectacularly. We demonstrate and carefully analyze the failure first on a toy problem, at which point a simple fix becomes obvious. We call this solution CoordConv, which works by giving convolution access to its own input coordinates through the use of extra coordinate channels. Without sacrificing the computational and parametric efficiency of ordinary convolution, CoordConv allows networks to learn either complete translation invariance or varying degrees of translation dependence, as required by the end task. CoordConv solves the coordinate transform problem with perfect generalization and 150 times faster with 10--100 times fewer parameters than convolution. This stark contrast raises the question: to what extent has this inability of convolution persisted insidiously inside other tasks, subtly hampering performance from within? A complete answer to this question will require further investigation, but we show preliminary evidence that swapping convolution for CoordConv can improve models on a diverse set of tasks. Using CoordConv in a GAN produced less mode collapse as the transform between high-level spatial latents and pixels becomes easier to learn. A Faster R-CNN detection model trained on MNIST showed 24% better IOU when using CoordConv, and in the RL domain agents playing Atari games benefit significantly from the use of CoordConv layers.

...read moreread less

190 citations

Posted Content•

The Surprising Creativity of Digital Evolution: A Collection of Anecdotes from the Evolutionary Computation and Artificial Life Research Communities

[...]

Joel Lehman, Jeff Clune, Dusan Misevic, Christoph Adami, Lee Altenberg, Julie Beaulieu, Peter J. Bentley, Samuel Bernard, Guillaume Beslon, David M. Bryson, Patryk Chrabaszcz, Nick Cheney, Antoine Cully, Stéphane Doncieux, Fred C. Dyer, Kai Olav Ellefsen, Robert Feldt, Stephan Fischer, Stephanie Forrest, Antoine Frénoy, Christian Gagné, Léni K. Le Goff, Laura M. Grabowski, Babak Hodjat, Frank Hutter, Laurent Keller, Carole Knibbe, Peter Krcah, Richard E. Lenski, Hod Lipson, Robert MacCurdy, Carlos Maestre, Risto Miikkulainen, Sara Mitri, David E. Moriarty, Jean-Baptiste Mouret, Anh Nguyen, Charles Ofria, Marc Parizeau, David P. Parsons, Robert T. Pennock, William F. Punch, Thomas S. Ray, Marc Schoenauer, Eric Shulte, Karl Sims, Kenneth O. Stanley, François Taddei, Danesh Tarapore, Simon Thibault, Westley Weimer, Richard A. Watson, Jason Yosinski - Show less +49 more

09 Mar 2018-arXiv: Neural and Evolutionary Computing

TL;DR: In this article, the authors present a collection of evolutionary stories from researchers in the fields of artificial life and evolutionary computation who have provided first-hand accounts of such cases and present substantial evidence that the existence and importance of evolutionary surprises extends beyond the natural world, and may indeed be a universal property of all complex evolving systems.

...read moreread less

Abstract: Biological evolution provides a creative fount of complex and subtle adaptations, often surprising the scientists who discover them. However, because evolution is an algorithmic process that transcends the substrate in which it occurs, evolution's creativity is not limited to nature. Indeed, many researchers in the field of digital evolution have observed their evolving algorithms and organisms subverting their intentions, exposing unrecognized bugs in their code, producing unexpected adaptations, or exhibiting outcomes uncannily convergent with ones in nature. Such stories routinely reveal creativity by evolution in these digital worlds, but they rarely fit into the standard scientific narrative. Instead they are often treated as mere obstacles to be overcome, rather than results that warrant study in their own right. The stories themselves are traded among researchers through oral tradition, but that mode of information transmission is inefficient and prone to error and outright loss. Moreover, the fact that these stories tend to be shared only among practitioners means that many natural scientists do not realize how interesting and lifelike digital organisms are and how natural their evolution can be. To our knowledge, no collection of such anecdotes has been published before. This paper is the crowd-sourced product of researchers in the fields of artificial life and evolutionary computation who have provided first-hand accounts of such cases. It thus serves as a written, fact-checked collection of scientifically important and even entertaining stories. In doing so we also present here substantial evidence that the existence and importance of evolutionary surprises extends beyond the natural world, and may indeed be a universal property of all complex evolving systems.

...read moreread less

183 citations

Proceedings Article•

Faster Neural Networks Straight from JPEG

[...]

Lionel Gueguen¹, Alex Sergeev, Ben Kadlec, Rosanne Liu², Jason Yosinski³ - Show less +1 more•Institutions (3)

DigitalGlobe¹, Uber ², Cornell University³

12 Feb 2018

TL;DR: A simple idea is proposed and explored: train CNNs directly on the blockwise discrete cosine transform (DCT) coefficients computed and available in the middle of the JPEG codec, modified to produce DCT coefficients directly, and evaluated on ImageNet.

...read moreread less

Abstract: The simple, elegant approach of training convolutional neural networks (CNNs) directly from RGB pixels has enjoyed overwhelming empirical success. But can more performance be squeezed out of networks by using different input representations? In this paper we propose and explore a simple idea: train CNNs directly on the blockwise discrete cosine transform (DCT) coefficients computed and available in the middle of the JPEG codec. Intuitively, when processing JPEG images using CNNs, it seems unnecessary to decompress a blockwise frequency representation to an expanded pixel representation, shuffle it from CPU to GPU, and then process it with a CNN that will learn something similar to a transform back to frequency representation in its first layers. Why not skip both steps and feed the frequency domain into the network directly? In this paper we modify \libjpeg to produce DCT coefficients directly, modify a ResNet-50 network to accommodate the differently sized and strided input, and evaluate performance on ImageNet. We find networks that are both faster and more accurate, as well as networks with about the same accuracy but 1.77x faster than ResNet-50.

...read moreread less

134 citations

Proceedings Article•

Measuring the Intrinsic Dimension of Objective Landscapes.

[...]

Chunyuan Li¹, Heerad Farkhoor, Rosanne Liu², Jason Yosinski³•Institutions (3)

Duke University¹, Uber ², Cornell University³

15 Feb 2018

TL;DR: In this paper, the authors propose to train networks not in their native parameter space, but instead in a smaller, randomly oriented subspace, and define this to be the intrinsic dimension of the objective landscape.

...read moreread less

Abstract: Many recently trained neural networks employ large numbers of parameters to achieve good performance. One may intuitively use the number of parameters required as a rough gauge of the difficulty of a problem. But how accurate are such notions? How many parameters are really needed? In this paper we attempt to answer this question by training networks not in their native parameter space, but instead in a smaller, randomly oriented subspace. We slowly increase the dimension of this subspace, note at which dimension solutions first appear, and define this to be the intrinsic dimension of the objective landscape. The approach is simple to implement, computationally tractable, and produces several suggestive conclusions. Many problems have smaller intrinsic dimensions than one might suspect, and the intrinsic dimension for a given dataset varies little across a family of models with vastly different sizes. This latter result has the profound implication that once a parameter space is large enough to solve a problem, extra parameters serve directly to increase the dimensionality of the solution manifold. Intrinsic dimension allows some quantitative comparison of problem difficulty across supervised, reinforcement, and other types of learning where we conclude, for example, that solving the inverted pendulum problem is 100 times easier than classifying digits from MNIST, and playing Atari Pong from pixels is about as hard as classifying CIFAR-10. In addition to providing new cartography of the objective landscapes wandered by parameterized models, the method is a simple technique for constructively obtaining an upper bound on the minimum description length of a solution. A byproduct of this construction is a simple approach for compressing networks, in some cases by more than 100 times.

...read moreread less

117 citations

Posted Content•

Measuring the Intrinsic Dimension of Objective Landscapes

[...]

Chunyuan Li¹, Heerad Farkhoor, Rosanne Liu², Jason Yosinski³•Institutions (3)

Duke University¹, Uber ², Cornell University³

24 Apr 2018-arXiv: Learning

TL;DR: In this paper, the authors propose to train networks not in their native parameter space, but instead in a smaller, randomly oriented subspace, and define this to be the intrinsic dimension of the objective landscape.

...read moreread less

Abstract: Many recently trained neural networks employ large numbers of parameters to achieve good performance. One may intuitively use the number of parameters required as a rough gauge of the difficulty of a problem. But how accurate are such notions? How many parameters are really needed? In this paper we attempt to answer this question by training networks not in their native parameter space, but instead in a smaller, randomly oriented subspace. We slowly increase the dimension of this subspace, note at which dimension solutions first appear, and define this to be the intrinsic dimension of the objective landscape. The approach is simple to implement, computationally tractable, and produces several suggestive conclusions. Many problems have smaller intrinsic dimensions than one might suspect, and the intrinsic dimension for a given dataset varies little across a family of models with vastly different sizes. This latter result has the profound implication that once a parameter space is large enough to solve a problem, extra parameters serve directly to increase the dimensionality of the solution manifold. Intrinsic dimension allows some quantitative comparison of problem difficulty across supervised, reinforcement, and other types of learning where we conclude, for example, that solving the inverted pendulum problem is 100 times easier than classifying digits from MNIST, and playing Atari Pong from pixels is about as hard as classifying CIFAR-10. In addition to providing new cartography of the objective landscapes wandered by parameterized models, the method is a simple technique for constructively obtaining an upper bound on the minimum description length of a solution. A byproduct of this construction is a simple approach for compressing networks, in some cases by more than 100 times.

...read moreread less

65 citations

Aracna: An Open-Source Quadruped Platform for Evolutionary Robotics

[...]

Sara Lohmann, Jason Yosinski, Eric Gold, Jeff Clune, Jeremy Blum, Hod Lipson - Show less +2 more

02 Jul 2018

TL;DR: Aracna as discussed by the authors is a quadruped robot that requires non-intuitive motor commands in order to locomote and thus provides an interesting challenge for gait learning algorithms, such as those frequently developed in the Evolutionary Computation and Artificial Life communities.

...read moreread less

Abstract: We describe a new, quadruped robot platform, Aracna, which requires non-intuitive motor commands in order to locomote and thus provides an interesting challenge for gait learning algorithms, such as those frequently developed in the Evolutionary Computation and Artificial Life communities. Aracna is an open-source hardware project composed of off-the-shelf and 3D-printed parts, enabling other research teams to modify its design according to their scientific needs. Aracna was designed to overcome the shortcomings of a previous quadruped robot platform, whose legs were so heavy that the motors could not reliably execute the commands sent to them. We avoid this problem by locating all motors in the body core instead of on the legs and through a design which enables the servos to have a greater mechanical advantage. Specifically, each of the four legs has two joints controlled by separate four-bar linkage mechanisms that drive the pitch of the hip joint and knee joint. This novel design causes unconventional kinematics, creating an opportunity for gaitlearning algorithms, which excel in counter-intuitive design spaces where human engineers tend to underperform. Because it is low-cost, flexible, kinematically interesting, and and improvement over a previous design, Aracna provides a useful new hardware platform for testing algorithms that automatically generate robotic behaviors.

...read moreread less

18 citations

Posted Content•

Metropolis-Hastings Generative Adversarial Networks

[...]

Ryan Turner, Jane Hung, Eric Frank, Yunus Saatci, Jason Yosinski - Show less +1 more

28 Nov 2018-arXiv: Machine Learning

TL;DR: The Metropolis-Hastings Generative Adversarial Network (MH-GAN) as mentioned in this paper uses the discriminator from GAN training to build a wrapper around the generator for improved sampling.

...read moreread less

Abstract: We introduce the Metropolis-Hastings generative adversarial network (MH-GAN), which combines aspects of Markov chain Monte Carlo and GANs. The MH-GAN draws samples from the distribution implicitly defined by a GAN's discriminator-generator pair, as opposed to standard GANs which draw samples from the distribution defined only by the generator. It uses the discriminator from GAN training to build a wrapper around the generator for improved sampling. With a perfect discriminator, this wrapped generator samples from the true distribution on the data exactly even when the generator is imperfect. We demonstrate the benefits of the improved generator on multiple benchmark datasets, including CIFAR-10 and CelebA, using the DCGAN, WGAN, and progressive GAN.

...read moreread less

11 citations

Proceedings Article•

Faster Neural Networks Straight from JPEG.

[...]

Lionel Gueguen¹, Alex Sergeev, Rosanne Liu², Jason Yosinski³•Institutions (3)

DigitalGlobe¹, Uber ², Cornell University³

01 Jan 2018

6 citations

Patent•

Leveraging JPEG discrete cosine transform coefficients in neural networks

[...]

Lionel Gueguen¹, Alexander Igorevich Sergeev¹, Ruoqian Liu, Jason Yosinski•Institutions (1)

Uber ¹

30 Jul 2018

TL;DR: In this article, a neural network is trained to accept transformed blocks of discrete cosine transform (DCT) coefficients which may be less computationally intensive than accepting raw image data as input.

...read moreread less

Abstract: A system classifies a compressed image or predicts likelihood values associated with a compressed image. The system partially decompresses compressed JPEG image data to obtain blocks of discrete cosine transform (DCT) coefficients that represent the image. The system may apply various transform functions to the individual blocks of DCT coefficients to resize the blocks so that they may be input together into a neural network for analysis. Weights of the neural network may be trained to accept transformed blocks of DCT coefficients which may be less computationally intensive than accepting raw image data as input.

...read moreread less

3 citations

Patent•

Generating compressed representation neural networks having high degree of accuracy

[...]

Jason Yosinski¹, Chunyuan Li¹, Ruoqian Liu•Institutions (1)

Uber ¹

26 Oct 2018

TL;DR: In this paper, a machine learning based model is trained using fewer parameters than specified, and the model can be uncompressed using the seed values and the trained parameter vector in the subspace.

...read moreread less

Abstract: Machine learning based models, for example, neural network models employ large numbers of parameters, from a few million to hundreds of millions or more. A machine learning based model is trained using fewer parameters than specified. An initial parameter vector is initialized, for example, using random number generation based on a seed. During training phase, the parameter vectors are modified in a subspace around the initial vector. The trained model can be stored or transmitted using seed values and the trained parameter vector in the subspace. The neural network model can be uncompressed using the seed values and the trained parameter vector in the subspace. The compressed representation of neural networks may be used for various applications such as generating maps, object recognition in images, processing of sensor data, natural language processing, and others.

...read moreread less

Showing papers by "Jason Yosinski published in 2018"