Top 17 papers published by Rob Fergus from New York University in 2013

Posted Content•

Visualizing and Understanding Convolutional Networks

[...]

Matthew D. Zeiler¹, Rob Fergus¹•Institutions (1)

12 Nov 2013-arXiv: Computer Vision and Pattern Recognition

TL;DR: In this article, the authors introduce a novel visualization technique that gives insight into the function of intermediate feature layers and the operation of the classifier, and perform an ablation study to discover the performance contribution from different model layers.

...read moreread less

Abstract: Large Convolutional Network models have recently demonstrated impressive classification performance on the ImageNet benchmark. However there is no clear understanding of why they perform so well, or how they might be improved. In this paper we address both issues. We introduce a novel visualization technique that gives insight into the function of intermediate feature layers and the operation of the classifier. We also perform an ablation study to discover the performance contribution from different model layers. This enables us to find model architectures that outperform Krizhevsky \etal on the ImageNet classification benchmark. We show our ImageNet model generalizes well to other datasets: when the softmax classifier is retrained, it convincingly beats the current state-of-the-art results on Caltech-101 and Caltech-256 datasets.

...read moreread less

2,982 citations

Proceedings Article•

Regularization of Neural Networks using DropConnect

[...]

Li Wan¹, Matthew D. Zeiler¹, Sixin Zhang¹, Yann L. Cun¹, Rob Fergus¹ - Show less +1 more•Institutions (1)

New York University¹

16 Jun 2013

TL;DR: This work introduces DropConnect, a generalization of Dropout, for regularizing large fully-connected layers within neural networks, and derives a bound on the generalization performance of both Dropout and DropConnect.

...read moreread less

Abstract: We introduce DropConnect, a generalization of Dropout (Hinton et al., 2012), for regularizing large fully-connected layers within neural networks. When training with Dropout, a randomly selected subset of activations are set to zero within each layer. DropConnect instead sets a randomly selected subset of weights within the network to zero. Each unit thus receives input from a random subset of units in the previous layer. We derive a bound on the generalization performance of both Dropout and DropConnect. We then evaluate DropConnect on a range of datasets, comparing to Dropout, and show state-of-the-art results on several image recognition benchmarks by aggregating multiple DropConnect-trained models.

...read moreread less

2,413 citations

Posted Content•

Intriguing properties of neural networks

[...]

Christian Szegedy¹, Wojciech Zaremba², Ilya Sutskever¹, Joan Bruna², Dumitru Erhan¹, Ian Goodfellow³, Rob Fergus⁴, Rob Fergus² - Show less +4 more•Institutions (4)

Google¹, New York University², Université de Montréal³, Facebook⁴

21 Dec 2013-arXiv: Computer Vision and Pattern Recognition

TL;DR: This article showed that deep neural networks learn input-output mappings that are fairly discontinuous to a significant extend, which suggests that it is the space, rather than individual units, that contains of the semantic information in the high layers of neural networks.

...read moreread less

Abstract: Deep neural networks are highly expressive models that have recently achieved state of the art performance on speech and visual recognition tasks. While their expressiveness is the reason they succeed, it also causes them to learn uninterpretable solutions that could have counter-intuitive properties. In this paper we report two such properties. First, we find that there is no distinction between individual high level units and random linear combinations of high level units, according to various methods of unit analysis. It suggests that it is the space, rather than the individual units, that contains of the semantic information in the high layers of neural networks. Second, we find that deep neural networks learn input-output mappings that are fairly discontinuous to a significant extend. We can cause the network to misclassify an image by applying a certain imperceptible perturbation, which is found by maximizing the network's prediction error. In addition, the specific nature of these perturbations is not a random artifact of learning: the same perturbation can cause a different network, that was trained on a different subset of the dataset, to misclassify the same input.

...read moreread less

1,313 citations

Posted Content•

OverFeat: Integrated Recognition, Localization and Detection using Convolutional Networks

[...]

Pierre Sermanet¹, David Eigen¹, Xiang Zhang¹, Michael Mathieu¹, Rob Fergus¹, Yann LeCun¹ - Show less +2 more•Institutions (1)

Courant Institute of Mathematical Sciences¹

21 Dec 2013-arXiv: Computer Vision and Pattern Recognition

TL;DR: This integrated framework for using Convolutional Networks for classification, localization and detection is the winner of the localization task of the ImageNet Large Scale Visual Recognition Challenge 2013 and obtained very competitive results for the detection and classifications tasks.

...read moreread less

Abstract: We present an integrated framework for using Convolutional Networks for classification, localization and detection. We show how a multiscale and sliding window approach can be efficiently implemented within a ConvNet. We also introduce a novel deep learning approach to localization by learning to predict object boundaries. Bounding boxes are then accumulated rather than suppressed in order to increase detection confidence. We show that different tasks can be learned simultaneously using a single shared network. This integrated framework is the winner of the localization task of the ImageNet Large Scale Visual Recognition Challenge 2013 (ILSVRC2013) and obtained very competitive results for the detection and classifications tasks. In post-competition work, we establish a new state of the art for the detection task. Finally, we release a feature extractor from our best model called OverFeat.

...read moreread less

902 citations

Posted Content•

Stochastic Pooling for Regularization of Deep Convolutional Neural Networks

[...]

Matthew D. Zeiler¹, Rob Fergus¹•Institutions (1)

New York University¹

16 Jan 2013-arXiv: Learning

TL;DR: In this article, the authors proposed a stochastic pooling method for regularizing large convolutional neural networks, which randomly picks the activation within each pooling region according to a multinomial distribution.

...read moreread less

Abstract: We introduce a simple and effective method for regularizing large convolutional neural networks. We replace the conventional deterministic pooling operations with a stochastic procedure, randomly picking the activation within each pooling region according to a multinomial distribution, given by the activities within the pooling region. The approach is hyper-parameter free and can be combined with other regularization approaches, such as dropout and data augmentation. We achieve state-of-the-art performance on four image datasets, relative to other approaches that do not utilize data augmentation.

...read moreread less

582 citations

Posted Content•

Visualizing and Understanding Convolutional Neural Networks

[...]

Matthew D. Zeiler, Rob Fergus

12 Nov 2013

TL;DR: In this paper, a novel visualization technique was introduced to give insight into the function of intermediate feature layers and the operation of the classifier, which enabled the authors to find model architectures that outperformed Krizhevsky et al. on ImageNet classification benchmark.

...read moreread less

Abstract: Large Convolutional Neural Network models have recently demonstrated impressive classification performance on the ImageNet benchmark \cite{Kriz12}. However there is no clear understanding of why they perform so well, or how they might be improved. In this paper we address both issues. We introduce a novel visualization technique that gives insight into the function of intermediate feature layers and the operation of the classifier. We also perform an ablation study to discover the performance contribution from different model layers. This enables us to find model architectures that outperform Krizhevsky \etal on the ImageNet classification benchmark. We show our ImageNet model generalizes well to other datasets: when the softmax classifier is retrained, it convincingly beats the current state-of-the-art results on Caltech-101 and Caltech-256 datasets.

...read moreread less

513 citations

Proceedings Article•DOI•

Restoring an Image Taken through a Window Covered with Dirt or Rain

[...]

David Eigen¹, Dilip Krishnan¹, Rob Fergus¹•Institutions (1)

New York University¹

01 Dec 2013

TL;DR: This work presents a post-capture image processing solution that can remove localized rain and dirt artifacts from a single image, and demonstrates effective removal of dirt and rain in outdoor test conditions.

...read moreread less

Abstract: Photographs taken through a window are often compromised by dirt or rain present on the window surface. Common cases of this include pictures taken from inside a vehicle, or outdoor security cameras mounted inside a protective enclosure. At capture time, defocus can be used to remove the artifacts, but this relies on achieving a shallow depth-of-field and placement of the camera close to the window. Instead, we present a post-capture image processing solution that can remove localized rain and dirt artifacts from a single image. We collect a dataset of clean/corrupted image pairs which are then used to train a specialized form of convolutional neural network. This learns how to map corrupted image patches to clean ones, implicitly capturing the characteristic appearance of dirt and water droplets in natural images. Our models demonstrate effective removal of dirt and rain in outdoor test conditions.

...read moreread less

447 citations

Journal Article•DOI•

Reconnaissance of the hr 8799 exosolar system. i. near-infrared spectroscopy

[...]

Ben R. Oppenheimer¹, Christoph Baranec², C. A. Beichman¹, C. A. Beichman², Douglas Brenner¹, Rick Burruss², Eric Cady², Justin R. Crepp², Justin R. Crepp³, Richard Dekany², Rob Fergus⁴, David Hale², Lynne A. Hillenbrand², Sasha Hinkley², David W. Hogg⁵, David A. King⁶, E. R. Ligon², Thomas Lockhart², Ricky Nilsson¹, Ian Parry⁶, Ian Parry¹, Laurent Pueyo⁷, Emily L. Rice⁸, Emily L. Rice¹, Jenny Roberts², L. C. Roberts², M. Shao², Anand Sivaramakrishnan¹, Anand Sivaramakrishnan⁹, Rémi Soummer¹, Rémi Soummer⁹, Tuan Truong², Gautam Vasisht², Aaron Veicht¹, Fred Vescelus², James Wallace², Chengxing Zhai², Neil T. Zimmerman¹⁰, Neil T. Zimmerman¹ - Show less +35 more•Institutions (10)

American Museum of Natural History¹, California Institute of Technology², University of Notre Dame³, Courant Institute of Mathematical Sciences⁴, New York University⁵, University of Cambridge⁶, Johns Hopkins University⁷, College of Staten Island⁸, Space Telescope Science Institute⁹, Max Planck Society¹⁰

01 May 2013-The Astrophysical Journal

TL;DR: In this paper, the authors obtained spectra in the wavelength range λ = 995-1769 nm of all known planets orbiting the star HR 8799 using the suite of instrumentation known as Project 1640 on the Palomar 5 m Hale Telescope.

...read moreread less

Abstract: We obtained spectra in the wavelength range λ = 995-1769 nm of all four known planets orbiting the star HR 8799 Using the suite of instrumentation known as Project 1640 on the Palomar 5 m Hale Telescope, we acquired data at two epochs This allowed for multiple imaging detections of the companions and multiple extractions of low-resolution (R ~ 35) spectra Data reduction employed two different methods of speckle suppression and spectrum extraction, both yielding results that agree The spectra do not directly correspond to those of any known objects, although similarities with L and T dwarfs are present, as well as some characteristics similar to planets such as Saturn We tentatively identify the presence of CH_4 along with NH_3 and/or C_2H_2, and possibly CO_2 or HCN in varying amounts in each component of the system Other studies suggested red colors for these faint companions, and our data confirm those observations Cloudy models, based on previous photometric observations, may provide the best explanation for the new data presented here Notable in our data is that these presumably co-eval objects of similar luminosity have significantly different spectra; the diversity of planets may be greater than previously thought The techniques and methods employed in this paper represent a new capability to observe and rapidly characterize exoplanetary systems in a routine manner over a broad range of planet masses and separations These are the first simultaneous spectroscopic observations of multiple planets in a planetary system other than our own

...read moreread less

182 citations

Journal Article•DOI•

Reconnaissance of the HR 8799 Exosolar System I: Near IR Spectroscopy

[...]

11 Mar 2013-arXiv: Earth and Planetary Astrophysics

TL;DR: In this article, the authors obtained spectra, in the wavelength range λ = 995 - 1769 nm, of all known planets orbiting the star HR 8799, using the suite of instrumentation known as Project 1640 on the Palomar 5-m Hale Telescope, and acquired data at two epochs.

...read moreread less

Abstract: We obtained spectra, in the wavelength range \lambda = 995 - 1769 nm, of all four known planets orbiting the star HR 8799. Using the suite of instrumentation known as Project 1640 on the Palomar 5-m Hale Telescope, we acquired data at two epochs. This allowed for multiple imaging detections of the companions and multiple extractions of low-resolution (R ~ 35) spectra. Data reduction employed two different methods of speckle suppression and spectrum extraction, both yielding results that agree. The spectra do not directly correspond to those of any known objects, although similarities with L and T-dwarfs are present, as well as some characteristics similar to planets such as Saturn. We tentatively identify the presence of CH_4 along with NH_3 and/or C_2H_2, and possibly CO_2 or HCN in varying amounts in each component of the system. Other studies suggested red colors for these faint companions, and our data confirm those observations. Cloudy models, based on previous photometric observations, may provide the best explanation for the new data presented here. Notable in our data is that these presumably co-eval objects of similar luminosity have significantly different spectra; the diversity of planets may be greater than previously thought. The techniques and methods employed in this paper represent a new capability to observe and rapidly characterize exoplanetary systems in a routine manner over a broad range of planet masses and separations. These are the first simultaneous spectroscopic observations of multiple planets in a planetary system other than our own.

...read moreread less

147 citations

Proceedings Article•

Stochastic Pooling for Regularization of Deep Convolutional Neural Networks

[...]

Matthew D. Zeiler¹, Rob Fergus¹•Institutions (1)

New York University¹

16 Jan 2013

TL;DR: A simple and effective method for regularizing large convolutional neural networks, which replaces the conventional deterministic pooling operations with a stochastic procedure, randomly picking the activation within each pooling region according to a multinomial distribution.

...read moreread less

Abstract: We introduce a simple and effective method for regularizing large convolutional neural networks. We replace the conventional deterministic pooling operations with a stochastic procedure, randomly picking the activation within each pooling region according to a multinomial distribution, given by the activities within the pooling region. The approach is hyper-parameter free and can be combined with other regularization approaches, such as dropout and data augmentation. We achieve state-of-the-art performance on four image datasets, relative to other approaches that do not utilize data augmentation.

...read moreread less

127 citations

Book Chapter•DOI•

Learning Binary Hash Codes for Large-Scale Image Search

[...]

Kristen Grauman¹, Rob Fergus²•Institutions (2)

University of Texas at Austin¹, New York University²

01 Jan 2013

TL;DR: This work shows that learned binary projections are a powerful way to index large collections according to their content, and it is possible to bound the number of database examples that must be searched in order to achieve a given level of accuracy.

...read moreread less

Abstract: Algorithms to rapidly search massive image or video collections are critical for many vision applications, including visual search, content-based retrieval, and non-parametric models for object recognition. Recent work shows that learned binary projections are a powerful way to index large collections according to their content. The basic idea is to formulate the projections so as to approximately preserve a given similarity function of interest. Having done so, one can then search the data efficiently using hash tables, or by exploring the Hamming ball volume around a novel query. Both enable sub-linear time retrieval with respect to the database size. Further, depending on the design of the projections, in some cases it is possible to bound the number of database examples that must be searched in order to achieve a given level of accuracy.

...read moreread less

Proceedings Article•

Understanding Deep Architectures using a Recursive Convolutional Network

[...]

David Eigen¹, Jason Rolfe¹, Rob Fergus¹, Yann LeCun²•Institutions (2)

New York University¹, Courant Institute of Mathematical Sciences²

18 Dec 2013

TL;DR: In this article, a recursive convolutional network whose weights are tied between layers is employed to assess the independent contributions of three of these linked variables: the number of layers, feature maps, and parameters.

...read moreread less

Abstract: A key challenge in designing convolutional network models is sizing them appropriately. Many factors are involved in these decisions, including number of layers, feature maps, kernel sizes, etc. Complicating this further is the fact that each of these influence not only the numbers and dimensions of the activation units, but also the total number of parameters. In this paper we focus on assessing the independent contributions of three of these linked variables: The numbers of layers, feature maps, and parameters. To accomplish this, we employ a recursive convolutional network whose weights are tied between layers; this allows us to vary each of the three factors in a controlled setting. We find that while increasing the numbers of layers and parameters each have clear benefit, the number of feature maps (and hence dimensionality of the representation) appears ancillary, and finds most of its benefit through the introduction of more weights. Our results (i) empirically confirm the notion that adding layers alone increases computational power, within the context of convolutional layers, and (ii) suggest that precise sizing of convolutional feature map dimensions is itself of little concern; more attention should be paid to the number of parameters in these layers instead.

...read moreread less

Posted Content•

Blind Deconvolution with Non-local Sparsity Reweighting

[...]

Dilip Krishnan, Joan Bruna, Rob Fergus

16 Nov 2013-arXiv: Computer Vision and Pattern Recognition

TL;DR: It is shown that all successful Variational or Maximum a-Posteriori algorithms share a common framework, relying on the following key principles: sparsity promotion in the gradient domain, regularization for kernel estimation, and the use of convex cost functions.

...read moreread less

Abstract: Blind deconvolution has made significant progress in the past decade. Most successful algorithms are classified either as Variational or Maximum a-Posteriori ($MAP$). In spite of the superior theoretical justification of variational techniques, carefully constructed $MAP$ algorithms have proven equally effective in practice. In this paper, we show that all successful $MAP$ and variational algorithms share a common framework, relying on the following key principles: sparsity promotion in the gradient domain, $l_2$ regularization for kernel estimation, and the use of convex (often quadratic) cost functions. Our observations lead to a unified understanding of the principles required for successful blind deconvolution. We incorporate these principles into a novel algorithm that improves significantly upon the state of the art.

...read moreread less

Posted Content•

Blind Deconvolution with Re-weighted Sparsity Promotion.

[...]

Dilip Krishnan, Joan Bruna, Rob Fergus

16 Nov 2013

TL;DR: It is shown that all successful MAP and variational algorithms share a common framework, relying on the following key principles: sparsity promotion in the gradient domain, l2 regularization for kernel estimation, and the use of convex (often quadratic) cost functions.

...read moreread less

Abstract: Blind deconvolution has made significant progress in the past decade. Most successful algorithms are classified either as Variational or Maximum a-Posteriori ($MAP$). In spite of the superior theoretical justification of variational techniques, carefully constructed $MAP$ algorithms have proven equally effective in practice. In this paper, we show that all successful $MAP$ and variational algorithms share a common framework, relying on the following key principles: sparsity promotion in the gradient domain, $l_2$ regularization for kernel estimation, and the use of convex (often quadratic) cost functions. Our observations lead to a unified understanding of the principles required for successful blind deconvolution. We incorporate these principles into a novel algorithm that improves significantly upon the state of the art.

...read moreread less

Posted Content•

Understanding Deep Architectures using a Recursive Convolutional Network

[...]

David Eigen¹, Jason Rolfe¹, Rob Fergus¹, Yann LeCun²•Institutions (2)

New York University¹, Courant Institute of Mathematical Sciences²

06 Dec 2013-arXiv: Learning

TL;DR: In this paper, a recursive convolutional network whose weights are tied between layers is employed to assess the independent contributions of three of these linked variables: the number of layers, feature maps, and parameters.

...read moreread less

Abstract: A key challenge in designing convolutional network models is sizing them appropriately. Many factors are involved in these decisions, including number of layers, feature maps, kernel sizes, etc. Complicating this further is the fact that each of these influence not only the numbers and dimensions of the activation units, but also the total number of parameters. In this paper we focus on assessing the independent contributions of three of these linked variables: The numbers of layers, feature maps, and parameters. To accomplish this, we employ a recursive convolutional network whose weights are tied between layers; this allows us to vary each of the three factors in a controlled setting. We find that while increasing the numbers of layers and parameters each have clear benefit, the number of feature maps (and hence dimensionality of the representation) appears ancillary, and finds most of its benefit through the introduction of more weights. Our results (i) empirically confirm the notion that adding layers alone increases computational power, within the context of convolutional layers, and (ii) suggest that precise sizing of convolutional feature map dimensions is itself of little concern; more attention should be paid to the number of parameters in these layers instead.

...read moreread less

Posted Content•

Maximizing Kepler science return per telemetered pixel: Detailed models of the focal plane in the two-wheel era

[...]

David W. Hogg, Ruth Angus, Thomas Barclay, Rebekah Dawson¹, Rob Fergus², Daniel Foreman-Mackey², Stefan Harmeling, Michael Hirsch, Dustin Lang³, Benjamin T. Montet⁴, David Schiminovich, Bernhard Schölkopf - Show less +8 more•Institutions (4)

CFA Institute¹, New York University², Carnegie Mellon University³, California Institute of Technology⁴

03 Sep 2013-arXiv: Instrumentation and Methods for Astrophysics

TL;DR: In this article, the authors argue that image modeling can greatly improve the precision of Kepler in pointing-degraded two-wheel mode, and demonstrate that the expected drift or jitter in positions in the two-weel era will help with constraining calibration parameters.

...read moreread less

Abstract: Kepler's immense photometric precision to date was maintained through satellite stability and precise pointing. In this white paper, we argue that image modeling--fitting the Kepler-downlinked raw pixel data--can vastly improve the precision of Kepler in pointing-degraded two-wheel mode. We argue that a non-trivial modeling effort may permit continuance of photometry at 10-ppm-level precision. We demonstrate some baby steps towards precise models in both data-driven (flexible) and physics-driven (interpretably parameterized) modes. We demonstrate that the expected drift or jitter in positions in the two-weel era will help with constraining calibration parameters. In particular, we show that we can infer the device flat-field at higher than pixel resolution; that is, we can infer pixel-to-pixel variations in intra-pixel sensitivity. These results are relevant to almost any scientific goal for the repurposed mission; image modeling ought to be a part of any two-wheel repurpose for the satellite. We make other recommendations for Kepler operations, but fundamentally advocate that the project stick with its core mission of finding and characterizing Earth analogs. [abridged]

...read moreread less

Low-level image priors and laplacian preconditioners for applications in computer graphics and computational photography

[...]

Rob Fergus¹, Dilip Krishnan¹•Institutions (1)

New York University¹

01 Jan 2013

TL;DR: Novel image priors and efficient algorithms for image denoising and deconvolution applications, and effective preconditioners for Laplacian matrices for discrete Poisson equations are developed.

...read moreread less

Abstract: In the first part of this thesis, we develop novel image priors and efficient algorithms for image denoising and deconvolution applications. Our priors and algorithms enable fast, high-quality restoration of images corrupted by noise or blur. In the second part, we develop effective preconditioners for Laplacian matrices. Such matrices arise in a number of computer graphics and computational photography problems such as image colorization, tone mapping and geodesic distance computation on 3D meshes. The first prior we develop is a spectral prior which models correlations between different spectral bands. We introduce a prototype camera and flash system, used in conjunction with the spectral prior, to enable taking photographs at very low light levels. Our second prior is a sparsity-based measure for blind image deconvolution. This prior gives lower costs to sharp images than blurred ones, enabling the use simple and efficient Maximum a-Posteriori algorithms. We develop a new algorithm for the non-blind deconvolution problem. This enables extremely fast deconvolution of images blurred by a known blur kernel. Our algorithm uses Fast Fourier Transforms and Lookup Tables to achieve real-time deconvolution performance with non convex gradient-based priors. Finally, for certain image restoration problems with no known formation model, we demonstrate how learning a mapping between original/corrupted patch pairs enables effective restoration. The preconditioners we develop are multi-level schemes for discrete Poisson equations. Existing multilevel preconditioners have two major drawbacks: excessive bandwidth growth at coarse levels; and the inability to adapt to problems with highly varying coefficients. Our approach tackles both these problems by introducing sparsification and compensation steps at each level. We interleave the selection of fine and coarse-level variables with the removal of weak connections between potential fine-level variables (sparsification) and compensate for these changes by strengthening nearby connections. By applying these operations before each elimination step and repeating the procedure recursively on the resulting smaller systems, we obtain highly efficient schemes. The construction is linear in time and memory. Numerical experiments demonstrate that our new schemes outperform state of the art methods, both in terms of operation count and wall-clock time, over a range of 2D and 3D problems.

...read moreread less

Showing papers by "Rob Fergus published in 2013"