Showing papers by "Greg S. Corrado published in 2012"

PDF

Open Access

Proceedings Article•

[...]

Jeffrey Dean¹, Greg S. Corrado¹, Rajat Monga¹, Kai Chen¹, Matthieu Devin¹, Mark Z. Mao¹, Marc'Aurelio Ranzato¹, Andrew W. Senior¹, Paul A. Tucker¹, Ke Yang¹, Quoc V. Le¹, Andrew Y. Ng¹ - Show less +8 more•Institutions (1)

Google¹

03 Dec 2012

TL;DR: This paper considers the problem of training a deep network with billions of parameters using tens of thousands of CPU cores and develops two algorithms for large-scale distributed training, Downpour SGD and Sandblaster L-BFGS, which increase the scale and speed of deep network training.

...read moreread less

Abstract: Recent work in unsupervised feature learning and deep learning has shown that being able to train large models can dramatically improve performance. In this paper, we consider the problem of training a deep network with billions of parameters using tens of thousands of CPU cores. We have developed a software framework called DistBelief that can utilize computing clusters with thousands of machines to train large models. Within this framework, we have developed two algorithms for large-scale distributed training: (i) Downpour SGD, an asynchronous stochastic gradient descent procedure supporting a large number of model replicas, and (ii) Sandblaster, a framework that supports a variety of distributed batch optimization procedures, including a distributed implementation of L-BFGS. Downpour SGD and Sandblaster L-BFGS both increase the scale and speed of deep network training. We have successfully used our system to train a deep network 30x larger than previously reported in the literature, and achieves state-of-the-art performance on ImageNet, a visual object recognition task with 16 million images and 21k categories. We show that these same techniques dramatically accelerate the training of a more modestly- sized deep network for a commercial speech recognition service. Although we focus on and report performance of these methods as applied to training large neural networks, the underlying algorithms are applicable to any gradient-based machine learning algorithm.

...read moreread less

3,475 citations

Proceedings Article•

Building high-level features using large scale unsupervised learning

[...]

Marc'Aurelio Ranzato¹, Rajat Monga¹, Matthieu Devin¹, Kai Chen¹, Greg S. Corrado¹, Jeffrey Dean¹, Quoc V. Le², Andrew Y. Ng² - Show less +4 more•Institutions (2)

Google¹, Stanford University²

26 Jun 2012

TL;DR: In this paper, a 9-layered locally connected sparse autoencoder with pooling and local contrast normalization was used to learn high-level, class-specific feature detectors from only unlabeled data.

...read moreread less

Abstract: We consider the problem of building high-level, class-specific feature detectors from only unlabeled data. For example, is it possible to learn a face detector using only unlabeled images using unlabeled images? To answer this, we train a 9-layered locally connected sparse autoencoder with pooling and local contrast normalization on a large dataset of images (the model has 1 billion connections, the dataset has 10 million 200×200 pixel images downloaded from the Internet). We train this network using model parallelism and asynchronous SGD on a cluster with 1,000 machines (16,000 cores) for three days. Contrary to what appears to be a widely-held intuition, our experimental results reveal that it is possible to train a face detector without having to label images as containing a face or not. Control experiments show that this feature detector is robust not only to translation but also to scaling and out-of-plane rotation. We also find that the same network is sensitive to other high-level concepts such as cat faces and human bodies. Starting with these learned features, we trained our network to obtain 15.8% accuracy in recognizing 20,000 object categories from ImageNet, a leap of 70% relative improvement over the previous state-of-the-art.

...read moreread less

786 citations

Appendix: Building high-level features using large scale unsupervised learning

[...]

Quoc V. Le, Marc'Aurelio Ranzato, Rajat Monga, Matthieu Devin, Kai Chen, Greg S. Corrado, Jeffrey Dean, Andrew Y. Ng - Show less +4 more

01 Jan 2012

TL;DR: In this paper, the authors discuss more details regarding the algorithm, its implementation, test set for 3D-transformed faces, experimental results for parameter sensitivity, and further visualizations for the learned neurons.

...read moreread less

Abstract: In this appendix, we discuss more details regarding the algorithm, its implementation, test set for 3D-transformed faces, experimental results for parameter sensitivity. We also present further visualizations for the learned neurons.

...read moreread less

36 citations

Proceedings Article•

Three controversial hypotheses concerning computation in the primate cortex

[...]

Thomas Dean¹, Greg S. Corrado¹, Jonathon Shlens¹•Institutions (1)

Google¹

22 Jul 2012

TL;DR: It is argued that while the authors' higher cognitive functions may interact in a complicated fashion, many of the component functions operate through well-defined interfaces and are built on a neural substrate that scales easily under the control of a modular genetic architecture.

...read moreread less

Abstract: We consider three hypotheses concerning the primate neocortex which have influenced computational neuroscience in recent years. Is the mind modular in terms of its being profitably described as a collection of relatively independent functional units? Does the regular structure of the cortex imply a single algorithm at work, operating on many different inputs in parallel? Can the cognitive differences between humans and our closest primate relatives be explained in terms of a scalable cortical architecture? We bring to bear diverse sources of evidence to argue that the answers to each of these questions -- with some judicious qualifications -- are in the affirmative. In particular, we argue that while our higher cognitive functions may interact in a complicated fashion, many of the component functions operate through well-defined interfaces and, perhaps more important, are built on a neural substrate that scales easily under the control of a modular genetic architecture. Processing in the primary sensory cortices seem amenable to similar algorithmic principles, and, even for those cases where alternative principles are at play, the regular structure of cortex allows the same or greater advantages as the architecture scales. Similar genetic machinery to that used by nature to scale body plans has apparently been applied to scale cortical computations. The resulting replicated computing units can be used to build larger working memory and support deeper recursions needed to qualitatively improve our abilities to handle language, abstraction and social interaction.

...read moreread less

6 citations