Showing papers by "Rob Fergus published in 2011"

PDF

Open Access

Proceedings Article•DOI•

Adaptive deconvolutional networks for mid and high level feature learning

[...]

Matthew D. Zeiler¹, Graham W. Taylor¹, Rob Fergus¹•Institutions (1)

06 Nov 2011

TL;DR: A hierarchical model that learns image decompositions via alternating layers of convolutional sparse coding and max pooling, relying on a novel inference scheme that ensures each layer reconstructs the input, rather than just the output of the layer directly beneath, as is common with existing hierarchical approaches.

...read moreread less

Abstract: We present a hierarchical model that learns image decompositions via alternating layers of convolutional sparse coding and max pooling. When trained on natural images, the layers of our model capture image information in a variety of forms: low-level edges, mid-level edge junctions, high-level object parts and complete objects. To build our model we rely on a novel inference scheme that ensures each layer reconstructs the input, rather than just the output of the layer directly beneath, as is common with existing hierarchical approaches. This makes it possible to learn multiple layers of representation and we show models with 4 layers, trained on images from the Caltech-101 and 256 datasets. When combined with a standard classifier, features extracted from these models outperform SIFT, as well as representations from other feature learning methods.

...read moreread less

1,257 citations

Proceedings Article•DOI•

Blind deconvolution using a normalized sparsity measure

[...]

Dilip Krishnan¹, Terence Tay, Rob Fergus¹•Institutions (1)

New York University¹

20 Jun 2011

TL;DR: A new type of image regularization which gives lowest cost for the true sharp image is introduced, which allows a very simple cost formulation to be used for the blind deconvolution model, obviating the need for additional methods.

...read moreread less

Abstract: Blind image deconvolution is an ill-posed problem that requires regularization to solve. However, many common forms of image prior used in this setting have a major drawback in that the minimum of the resulting cost function does not correspond to the true sharp solution. Accordingly, a range of additional methods are needed to yield good results (Bayesian methods, adaptive cost functions, alpha-matte extraction and edge localization). In this paper we introduce a new type of image regularization which gives lowest cost for the true sharp image. This allows a very simple cost formulation to be used for the blind deconvolution model, obviating the need for additional methods. Due to its simplicity the algorithm is fast and very robust. We demonstrate our method on real images with both spatially invariant and spatially varying blur.

...read moreread less

1,054 citations

Proceedings Article•DOI•

Indoor scene segmentation using a structured light sensor

[...]

Nathan Silberman¹, Rob Fergus¹•Institutions (1)

New York University¹

01 Nov 2011

TL;DR: This paper uses a CRF-based model to evaluate a range of different representations for depth information and proposes a novel prior on 3D location, revealing that the combination of depth and intensity images gives dramatic performance gains over intensity images alone.

...read moreread less

Abstract: In this paper we explore how a structured light depth sensor, in the form of the Microsoft Kinect, can assist with indoor scene segmentation. We use a CRF-based model to evaluate a range of different representations for depth information and propose a novel prior on 3D location. We introduce a new and challenging indoor scene dataset, complete with accurate depth maps and dense label coverage. Evaluating our model on this dataset reveals that the combination of depth and intensity images gives dramatic performance gains over intensity images alone. Our results clearly demonstrate the utility of structured light sensors for scene understanding.

...read moreread less

526 citations

Proceedings Article•DOI•

Learning invariance through imitation

[...]

Graham W. Taylor¹, Ian Spiro¹, Christoph Bregler¹, Rob Fergus¹•Institutions (1)

Courant Institute of Mathematical Sciences¹

20 Jun 2011

TL;DR: This paper proposes crowd-sourcing similar images by soliciting human imitations by exploiting temporal coherence in video to generate additional pairwise graded similarities between the user-contributed imitations.

...read moreread less

Abstract: Supervised methods for learning an embedding aim to map high-dimensional images to a space in which perceptually similar observations have high measurable similarity. Most approaches rely on binary similarity, typically defined by class membership where labels are expensive to obtain and/or difficult to define. In this paper we propose crowd-sourcing similar images by soliciting human imitations. We exploit temporal coherence in video to generate additional pairwise graded similarities between the user-contributed imitations. We introduce two methods for learning nonlinear, invariant mappings that exploit graded similarities. We learn a model that is highly effective at matching people in similar pose. It exhibits remarkable invariance to identity, clothing, background, lighting, shift and scale.

...read moreread less

50 citations

Proceedings Article•

Facial Expression Transfer with Input-Output Temporal Restricted Boltzmann Machines

[...]

Matthew D. Zeiler¹, Graham W. Taylor¹, Leonid Sigal², Iain Matthews², Rob Fergus¹ - Show less +1 more•Institutions (2)

New York University¹, Disney Research²

12 Dec 2011

TL;DR: A type of Temporal Restricted Boltzmann Machine that defines a probability distribution over an output sequence conditional on an input sequence, sharing the desirable properties of RBMs: efficient exact inference, an exponentially more expressive latent state, and the ability to model nonlinear structure and dynamics.

...read moreread less

Abstract: We present a type of Temporal Restricted Boltzmann Machine that defines a probability distribution over an output sequence conditional on an input sequence. It shares the desirable properties of RBMs: efficient exact inference, an exponentially more expressive latent state than HMMs, and the ability to model nonlinear structure and dynamics. We apply our model to a challenging real-world graphics problem: facial expression transfer. Our results demonstrate improved performance over several baselines modeling high-dimensional 2D and 3D data.

...read moreread less

39 citations