Showing papers by "Oriol Vinyals published in 2013"

PDF

Open Access

Posted Content•

DeCAF: A Deep Convolutional Activation Feature for Generic Visual Recognition

[...]

Jeff Donahue¹, Yangqing Jia¹, Oriol Vinyals¹, Judy Hoffman¹, Ning Zhang¹, Eric Tzeng¹, Trevor Darrell¹ - Show less +3 more•Institutions (1)

University of California, Berkeley¹

06 Oct 2013-arXiv: Computer Vision and Pattern Recognition

TL;DR: DeCAF, an open-source implementation of deep convolutional activation features, along with all associated network parameters, are released to enable vision researchers to be able to conduct experimentation with deep representations across a range of visual concept learning paradigms.

...read moreread less

Abstract: We evaluate whether features extracted from the activation of a deep convolutional network trained in a fully supervised fashion on a large, fixed set of object recognition tasks can be re-purposed to novel generic tasks. Our generic tasks may differ significantly from the originally trained tasks and there may be insufficient labeled or unlabeled data to conventionally train or adapt a deep architecture to the new tasks. We investigate and visualize the semantic clustering of deep convolutional features with respect to a variety of such tasks, including scene recognition, domain adaptation, and fine-grained recognition challenges. We compare the efficacy of relying on various network levels to define a fixed feature, and report novel results that significantly outperform the state-of-the-art on several important vision challenges. We are releasing DeCAF, an open-source implementation of these deep convolutional activation features, along with all associated network parameters to enable vision researchers to be able to conduct experimentation with deep representations across a range of visual concept learning paradigms.

...read moreread less

3,546 citations

Proceedings Article•DOI•

Deep vs. wide: depth on a budget for robust speech recognition.

[...]

Oriol Vinyals¹, Nelson Morgan¹•Institutions (1)

University of California, Berkeley¹

25 Aug 2013

TL;DR: This work incorporates the constraint of freezing the number of parameters for a given task, which in many applications corresponds to practical limitations on storage or computation, and determines that a large number of layers is not always optimum.

...read moreread less

Abstract: It has now been established that incorporating neural networks can be useful for speech recognition, and that machine learning methods can make it practical to incorporate a larger number of hidden layers in a “deep” structure. Here we incorporate the constraint of freezing the number of parameters for a given task, which in many applications corresponds to practical limitations on storage or computation. Given this constraint, we vary the size of each hidden layer as we change the number of layers so as to keep the total number of parameters constant. In this way we have determined, for a common task of noisy speech recognition (Aurora2), that a large number of layers is not always optimum; for each noise level there is an optimum number of layers. We also use state-of-the-art optimization algorithms to further understand the effect of initialization and convergence properties of such networks, and to have an efficient implementation that allows us to run more experiments with a standard desktop machine with a single GPU.

...read moreread less

12 citations

Posted Content•

Pooling-Invariant Image Feature Learning

[...]

Yangqing Jia, Oriol Vinyals, Trevor Darrell

15 Jan 2013-arXiv: Computer Vision and Pattern Recognition

TL;DR: This paper offers a novel dictionary learning scheme to efficiently take into account the invariance of learned features after the spatial pooling stage, built on simple clustering, and thus enjoys efficiency and scalability.

...read moreread less

Abstract: Unsupervised dictionary learning has been a key component in state-of-the-art computer vision recognition architectures. While highly effective methods exist for patch-based dictionary learning, these methods may learn redundant features after the pooling stage in a given early vision architecture. In this paper, we offer a novel dictionary learning scheme to efficiently take into account the invariance of learned features after the spatial pooling stage. The algorithm is built on simple clustering, and thus enjoys efficiency and scalability. We discuss the underlying mechanism that justifies the use of clustering algorithms, and empirically show that the algorithm finds better dictionaries than patch-based methods with the same dictionary size.

...read moreread less

10 citations

Proceedings Article•

On Compact Codes for Spatially Pooled Features

[...]

Yangqing Jia¹, Oriol Vinyals¹, Trevor Darrell¹•Institutions (1)

University of California, Berkeley¹

16 Jun 2013

TL;DR: This paper analyzes the classification accuracy with respect to dictionary size by linking the encoding stage to kernel methods and Nystrom sampling, and obtains useful bounds on accuracy as a function of size.

...read moreread less

Abstract: Feature encoding with an overcomplete dictionary has demonstrated good performance in many applications, especially computer vision. In this paper we analyze the classification accuracy with respect to dictionary size by linking the encoding stage to kernel methods and Nystrom sampling, and obtain useful bounds on accuracy as a function of size. The Nystrom method also inspires us to revisit dictionary learning from local patches, and we propose to learn the dictionary in an end-to-end fashion taking into account pooling, a common computational layer in vision. We validate our contribution by showing how the derived bounds are able to explain the observed behavior of multiple datasets, and show that the pooling aware method efficiently reduces the dictionary size by a factor of two for a given accuracy.

...read moreread less

7 citations

Posted Content•

Why Size Matters: Feature Coding as Nystrom Sampling

[...]

Oriol Vinyals¹, Yangqing Jia¹, Trevor Darrell¹•Institutions (1)

University of California, Berkeley¹

15 Jan 2013-arXiv: Learning

TL;DR: A novel view of feature extraction pipelines that rely on a coding step followed by a linear classifier based on kernel methods and Nystrom sampling is proposed, which may help explaining the positive effect of the codebook size and justifying the need to stack more layers, as flat models empirically saturate as the authors add more complexity.

...read moreread less

Abstract: Recently, the computer vision and machine learning community has been in favor of feature extraction pipelines that rely on a coding step followed by a linear classifier, due to their overall simplicity, well understood properties of linear classifiers, and their computational efficiency. In this paper we propose a novel view of this pipeline based on kernel methods and Nystrom sampling. In particular, we focus on the coding of a data point with a local representation based on a dictionary with fewer elements than the number of data points, and view it as an approximation to the actual function that would compute pair-wise similarity to all data points (often too many to compute in practice), followed by a Nystrom sampling step to select a subset of all data points. Furthermore, since bounds are known on the approximation power of Nystrom sampling as a function of how many samples (i.e. dictionary size) we consider, we can derive bounds on the approximation of the exact (but expensive to compute) kernel matrix, and use it as a proxy to predict accuracy as a function of the dictionary size, which has been observed to increase but also to saturate as we increase its size. This model may help explaining the positive effect of the codebook size and justifying the need to stack more layers (often referred to as deep learning), as flat models empirically saturate as we add more complexity.

...read moreread less

1 citations