scispace - formally typeset
Search or ask a question

Showing papers by "Rob Fergus published in 2013"


Posted Content
TL;DR: In this article, the authors introduce a novel visualization technique that gives insight into the function of intermediate feature layers and the operation of the classifier, and perform an ablation study to discover the performance contribution from different model layers.
Abstract: Large Convolutional Network models have recently demonstrated impressive classification performance on the ImageNet benchmark. However there is no clear understanding of why they perform so well, or how they might be improved. In this paper we address both issues. We introduce a novel visualization technique that gives insight into the function of intermediate feature layers and the operation of the classifier. We also perform an ablation study to discover the performance contribution from different model layers. This enables us to find model architectures that outperform Krizhevsky \etal on the ImageNet classification benchmark. We show our ImageNet model generalizes well to other datasets: when the softmax classifier is retrained, it convincingly beats the current state-of-the-art results on Caltech-101 and Caltech-256 datasets.

2,982 citations


Proceedings Article
Li Wan1, Matthew D. Zeiler1, Sixin Zhang1, Yann L. Cun1, Rob Fergus1 
16 Jun 2013
TL;DR: This work introduces DropConnect, a generalization of Dropout, for regularizing large fully-connected layers within neural networks, and derives a bound on the generalization performance of both Dropout and DropConnect.
Abstract: We introduce DropConnect, a generalization of Dropout (Hinton et al., 2012), for regularizing large fully-connected layers within neural networks. When training with Dropout, a randomly selected subset of activations are set to zero within each layer. DropConnect instead sets a randomly selected subset of weights within the network to zero. Each unit thus receives input from a random subset of units in the previous layer. We derive a bound on the generalization performance of both Dropout and DropConnect. We then evaluate DropConnect on a range of datasets, comparing to Dropout, and show state-of-the-art results on several image recognition benchmarks by aggregating multiple DropConnect-trained models.

2,413 citations


Posted Content
TL;DR: This article showed that deep neural networks learn input-output mappings that are fairly discontinuous to a significant extend, which suggests that it is the space, rather than individual units, that contains of the semantic information in the high layers of neural networks.
Abstract: Deep neural networks are highly expressive models that have recently achieved state of the art performance on speech and visual recognition tasks. While their expressiveness is the reason they succeed, it also causes them to learn uninterpretable solutions that could have counter-intuitive properties. In this paper we report two such properties. First, we find that there is no distinction between individual high level units and random linear combinations of high level units, according to various methods of unit analysis. It suggests that it is the space, rather than the individual units, that contains of the semantic information in the high layers of neural networks. Second, we find that deep neural networks learn input-output mappings that are fairly discontinuous to a significant extend. We can cause the network to misclassify an image by applying a certain imperceptible perturbation, which is found by maximizing the network's prediction error. In addition, the specific nature of these perturbations is not a random artifact of learning: the same perturbation can cause a different network, that was trained on a different subset of the dataset, to misclassify the same input.

1,313 citations


Posted Content
TL;DR: This integrated framework for using Convolutional Networks for classification, localization and detection is the winner of the localization task of the ImageNet Large Scale Visual Recognition Challenge 2013 and obtained very competitive results for the detection and classifications tasks.
Abstract: We present an integrated framework for using Convolutional Networks for classification, localization and detection. We show how a multiscale and sliding window approach can be efficiently implemented within a ConvNet. We also introduce a novel deep learning approach to localization by learning to predict object boundaries. Bounding boxes are then accumulated rather than suppressed in order to increase detection confidence. We show that different tasks can be learned simultaneously using a single shared network. This integrated framework is the winner of the localization task of the ImageNet Large Scale Visual Recognition Challenge 2013 (ILSVRC2013) and obtained very competitive results for the detection and classifications tasks. In post-competition work, we establish a new state of the art for the detection task. Finally, we release a feature extractor from our best model called OverFeat.

902 citations


Posted Content
TL;DR: In this article, the authors proposed a stochastic pooling method for regularizing large convolutional neural networks, which randomly picks the activation within each pooling region according to a multinomial distribution.
Abstract: We introduce a simple and effective method for regularizing large convolutional neural networks. We replace the conventional deterministic pooling operations with a stochastic procedure, randomly picking the activation within each pooling region according to a multinomial distribution, given by the activities within the pooling region. The approach is hyper-parameter free and can be combined with other regularization approaches, such as dropout and data augmentation. We achieve state-of-the-art performance on four image datasets, relative to other approaches that do not utilize data augmentation.

582 citations


Posted Content
12 Nov 2013
TL;DR: In this paper, a novel visualization technique was introduced to give insight into the function of intermediate feature layers and the operation of the classifier, which enabled the authors to find model architectures that outperformed Krizhevsky et al. on ImageNet classification benchmark.
Abstract: Large Convolutional Neural Network models have recently demonstrated impressive classification performance on the ImageNet benchmark \cite{Kriz12}. However there is no clear understanding of why they perform so well, or how they might be improved. In this paper we address both issues. We introduce a novel visualization technique that gives insight into the function of intermediate feature layers and the operation of the classifier. We also perform an ablation study to discover the performance contribution from different model layers. This enables us to find model architectures that outperform Krizhevsky \etal on the ImageNet classification benchmark. We show our ImageNet model generalizes well to other datasets: when the softmax classifier is retrained, it convincingly beats the current state-of-the-art results on Caltech-101 and Caltech-256 datasets.

513 citations


Proceedings ArticleDOI
01 Dec 2013
TL;DR: This work presents a post-capture image processing solution that can remove localized rain and dirt artifacts from a single image, and demonstrates effective removal of dirt and rain in outdoor test conditions.
Abstract: Photographs taken through a window are often compromised by dirt or rain present on the window surface. Common cases of this include pictures taken from inside a vehicle, or outdoor security cameras mounted inside a protective enclosure. At capture time, defocus can be used to remove the artifacts, but this relies on achieving a shallow depth-of-field and placement of the camera close to the window. Instead, we present a post-capture image processing solution that can remove localized rain and dirt artifacts from a single image. We collect a dataset of clean/corrupted image pairs which are then used to train a specialized form of convolutional neural network. This learns how to map corrupted image patches to clean ones, implicitly capturing the characteristic appearance of dirt and water droplets in natural images. Our models demonstrate effective removal of dirt and rain in outdoor test conditions.

447 citations


Journal ArticleDOI
TL;DR: In this paper, the authors obtained spectra in the wavelength range λ = 995-1769 nm of all known planets orbiting the star HR 8799 using the suite of instrumentation known as Project 1640 on the Palomar 5 m Hale Telescope.
Abstract: We obtained spectra in the wavelength range λ = 995-1769 nm of all four known planets orbiting the star HR 8799 Using the suite of instrumentation known as Project 1640 on the Palomar 5 m Hale Telescope, we acquired data at two epochs This allowed for multiple imaging detections of the companions and multiple extractions of low-resolution (R ~ 35) spectra Data reduction employed two different methods of speckle suppression and spectrum extraction, both yielding results that agree The spectra do not directly correspond to those of any known objects, although similarities with L and T dwarfs are present, as well as some characteristics similar to planets such as Saturn We tentatively identify the presence of CH_4 along with NH_3 and/or C_2H_2, and possibly CO_2 or HCN in varying amounts in each component of the system Other studies suggested red colors for these faint companions, and our data confirm those observations Cloudy models, based on previous photometric observations, may provide the best explanation for the new data presented here Notable in our data is that these presumably co-eval objects of similar luminosity have significantly different spectra; the diversity of planets may be greater than previously thought The techniques and methods employed in this paper represent a new capability to observe and rapidly characterize exoplanetary systems in a routine manner over a broad range of planet masses and separations These are the first simultaneous spectroscopic observations of multiple planets in a planetary system other than our own

182 citations


Journal ArticleDOI
TL;DR: In this article, the authors obtained spectra, in the wavelength range λ = 995 - 1769 nm, of all known planets orbiting the star HR 8799, using the suite of instrumentation known as Project 1640 on the Palomar 5-m Hale Telescope, and acquired data at two epochs.
Abstract: We obtained spectra, in the wavelength range \lambda = 995 - 1769 nm, of all four known planets orbiting the star HR 8799. Using the suite of instrumentation known as Project 1640 on the Palomar 5-m Hale Telescope, we acquired data at two epochs. This allowed for multiple imaging detections of the companions and multiple extractions of low-resolution (R ~ 35) spectra. Data reduction employed two different methods of speckle suppression and spectrum extraction, both yielding results that agree. The spectra do not directly correspond to those of any known objects, although similarities with L and T-dwarfs are present, as well as some characteristics similar to planets such as Saturn. We tentatively identify the presence of CH_4 along with NH_3 and/or C_2H_2, and possibly CO_2 or HCN in varying amounts in each component of the system. Other studies suggested red colors for these faint companions, and our data confirm those observations. Cloudy models, based on previous photometric observations, may provide the best explanation for the new data presented here. Notable in our data is that these presumably co-eval objects of similar luminosity have significantly different spectra; the diversity of planets may be greater than previously thought. The techniques and methods employed in this paper represent a new capability to observe and rapidly characterize exoplanetary systems in a routine manner over a broad range of planet masses and separations. These are the first simultaneous spectroscopic observations of multiple planets in a planetary system other than our own.

147 citations


Proceedings Article
16 Jan 2013
TL;DR: A simple and effective method for regularizing large convolutional neural networks, which replaces the conventional deterministic pooling operations with a stochastic procedure, randomly picking the activation within each pooling region according to a multinomial distribution.
Abstract: We introduce a simple and effective method for regularizing large convolutional neural networks. We replace the conventional deterministic pooling operations with a stochastic procedure, randomly picking the activation within each pooling region according to a multinomial distribution, given by the activities within the pooling region. The approach is hyper-parameter free and can be combined with other regularization approaches, such as dropout and data augmentation. We achieve state-of-the-art performance on four image datasets, relative to other approaches that do not utilize data augmentation.

127 citations


Book ChapterDOI
01 Jan 2013
TL;DR: This work shows that learned binary projections are a powerful way to index large collections according to their content, and it is possible to bound the number of database examples that must be searched in order to achieve a given level of accuracy.
Abstract: Algorithms to rapidly search massive image or video collections are critical for many vision applications, including visual search, content-based retrieval, and non-parametric models for object recognition. Recent work shows that learned binary projections are a powerful way to index large collections according to their content. The basic idea is to formulate the projections so as to approximately preserve a given similarity function of interest. Having done so, one can then search the data efficiently using hash tables, or by exploring the Hamming ball volume around a novel query. Both enable sub-linear time retrieval with respect to the database size. Further, depending on the design of the projections, in some cases it is possible to bound the number of database examples that must be searched in order to achieve a given level of accuracy.

Proceedings Article
18 Dec 2013
TL;DR: In this article, a recursive convolutional network whose weights are tied between layers is employed to assess the independent contributions of three of these linked variables: the number of layers, feature maps, and parameters.
Abstract: A key challenge in designing convolutional network models is sizing them appropriately. Many factors are involved in these decisions, including number of layers, feature maps, kernel sizes, etc. Complicating this further is the fact that each of these influence not only the numbers and dimensions of the activation units, but also the total number of parameters. In this paper we focus on assessing the independent contributions of three of these linked variables: The numbers of layers, feature maps, and parameters. To accomplish this, we employ a recursive convolutional network whose weights are tied between layers; this allows us to vary each of the three factors in a controlled setting. We find that while increasing the numbers of layers and parameters each have clear benefit, the number of feature maps (and hence dimensionality of the representation) appears ancillary, and finds most of its benefit through the introduction of more weights. Our results (i) empirically confirm the notion that adding layers alone increases computational power, within the context of convolutional layers, and (ii) suggest that precise sizing of convolutional feature map dimensions is itself of little concern; more attention should be paid to the number of parameters in these layers instead.

Posted Content
TL;DR: It is shown that all successful Variational or Maximum a-Posteriori algorithms share a common framework, relying on the following key principles: sparsity promotion in the gradient domain, regularization for kernel estimation, and the use of convex cost functions.
Abstract: Blind deconvolution has made significant progress in the past decade. Most successful algorithms are classified either as Variational or Maximum a-Posteriori ($MAP$). In spite of the superior theoretical justification of variational techniques, carefully constructed $MAP$ algorithms have proven equally effective in practice. In this paper, we show that all successful $MAP$ and variational algorithms share a common framework, relying on the following key principles: sparsity promotion in the gradient domain, $l_2$ regularization for kernel estimation, and the use of convex (often quadratic) cost functions. Our observations lead to a unified understanding of the principles required for successful blind deconvolution. We incorporate these principles into a novel algorithm that improves significantly upon the state of the art.

Posted Content
16 Nov 2013
TL;DR: It is shown that all successful MAP and variational algorithms share a common framework, relying on the following key principles: sparsity promotion in the gradient domain, l2 regularization for kernel estimation, and the use of convex (often quadratic) cost functions.
Abstract: Blind deconvolution has made significant progress in the past decade. Most successful algorithms are classified either as Variational or Maximum a-Posteriori ($MAP$). In spite of the superior theoretical justification of variational techniques, carefully constructed $MAP$ algorithms have proven equally effective in practice. In this paper, we show that all successful $MAP$ and variational algorithms share a common framework, relying on the following key principles: sparsity promotion in the gradient domain, $l_2$ regularization for kernel estimation, and the use of convex (often quadratic) cost functions. Our observations lead to a unified understanding of the principles required for successful blind deconvolution. We incorporate these principles into a novel algorithm that improves significantly upon the state of the art.

Posted Content
TL;DR: In this paper, a recursive convolutional network whose weights are tied between layers is employed to assess the independent contributions of three of these linked variables: the number of layers, feature maps, and parameters.
Abstract: A key challenge in designing convolutional network models is sizing them appropriately. Many factors are involved in these decisions, including number of layers, feature maps, kernel sizes, etc. Complicating this further is the fact that each of these influence not only the numbers and dimensions of the activation units, but also the total number of parameters. In this paper we focus on assessing the independent contributions of three of these linked variables: The numbers of layers, feature maps, and parameters. To accomplish this, we employ a recursive convolutional network whose weights are tied between layers; this allows us to vary each of the three factors in a controlled setting. We find that while increasing the numbers of layers and parameters each have clear benefit, the number of feature maps (and hence dimensionality of the representation) appears ancillary, and finds most of its benefit through the introduction of more weights. Our results (i) empirically confirm the notion that adding layers alone increases computational power, within the context of convolutional layers, and (ii) suggest that precise sizing of convolutional feature map dimensions is itself of little concern; more attention should be paid to the number of parameters in these layers instead.

Posted Content
TL;DR: In this article, the authors argue that image modeling can greatly improve the precision of Kepler in pointing-degraded two-wheel mode, and demonstrate that the expected drift or jitter in positions in the two-weel era will help with constraining calibration parameters.
Abstract: Kepler's immense photometric precision to date was maintained through satellite stability and precise pointing. In this white paper, we argue that image modeling--fitting the Kepler-downlinked raw pixel data--can vastly improve the precision of Kepler in pointing-degraded two-wheel mode. We argue that a non-trivial modeling effort may permit continuance of photometry at 10-ppm-level precision. We demonstrate some baby steps towards precise models in both data-driven (flexible) and physics-driven (interpretably parameterized) modes. We demonstrate that the expected drift or jitter in positions in the two-weel era will help with constraining calibration parameters. In particular, we show that we can infer the device flat-field at higher than pixel resolution; that is, we can infer pixel-to-pixel variations in intra-pixel sensitivity. These results are relevant to almost any scientific goal for the repurposed mission; image modeling ought to be a part of any two-wheel repurpose for the satellite. We make other recommendations for Kepler operations, but fundamentally advocate that the project stick with its core mission of finding and characterizing Earth analogs. [abridged]

01 Jan 2013
TL;DR: Novel image priors and efficient algorithms for image denoising and deconvolution applications, and effective preconditioners for Laplacian matrices for discrete Poisson equations are developed.
Abstract: In the first part of this thesis, we develop novel image priors and efficient algorithms for image denoising and deconvolution applications. Our priors and algorithms enable fast, high-quality restoration of images corrupted by noise or blur. In the second part, we develop effective preconditioners for Laplacian matrices. Such matrices arise in a number of computer graphics and computational photography problems such as image colorization, tone mapping and geodesic distance computation on 3D meshes. The first prior we develop is a spectral prior which models correlations between different spectral bands. We introduce a prototype camera and flash system, used in conjunction with the spectral prior, to enable taking photographs at very low light levels. Our second prior is a sparsity-based measure for blind image deconvolution. This prior gives lower costs to sharp images than blurred ones, enabling the use simple and efficient Maximum a-Posteriori algorithms. We develop a new algorithm for the non-blind deconvolution problem. This enables extremely fast deconvolution of images blurred by a known blur kernel. Our algorithm uses Fast Fourier Transforms and Lookup Tables to achieve real-time deconvolution performance with non convex gradient-based priors. Finally, for certain image restoration problems with no known formation model, we demonstrate how learning a mapping between original/corrupted patch pairs enables effective restoration. The preconditioners we develop are multi-level schemes for discrete Poisson equations. Existing multilevel preconditioners have two major drawbacks: excessive bandwidth growth at coarse levels; and the inability to adapt to problems with highly varying coefficients. Our approach tackles both these problems by introducing sparsification and compensation steps at each level. We interleave the selection of fine and coarse-level variables with the removal of weak connections between potential fine-level variables (sparsification) and compensate for these changes by strengthening nearby connections. By applying these operations before each elimination step and repeating the procedure recursively on the resulting smaller systems, we obtain highly efficient schemes. The construction is linear in time and memory. Numerical experiments demonstrate that our new schemes outperform state of the art methods, both in terms of operation count and wall-clock time, over a range of 2D and 3D problems.