Showing papers by "Ruslan Salakhutdinov published in 2011"

PDF

Open Access

Journal Article•

One shot learning of simple visual concepts

[...]

Brenden M. Lake, Ruslan Salakhutdinov, Jason Gross, Joshua B. Tenenbaum

01 Jan 2011-Cognitive Science

TL;DR: A generative model of how characters are composed from strokes is introduced, where knowledge from previous characters helps to infer the latent strokes in novel characters, using a massive new dataset of handwritten characters.

...read moreread less

757 citations

Proceedings Article•DOI•

Learning to share visual appearance for multiclass object detection

[...]

Ruslan Salakhutdinov¹, Antonio Torralba¹, Josh Tenenbaum¹•Institutions (1)

Massachusetts Institute of Technology¹

20 Jun 2011

TL;DR: A hierarchical classification model that allows rare objects to borrow statistical strength from related objects that have many training examples and learns both a hierarchy for sharing visual appearance across 200 object categories and hierarchical parameters is presented.

...read moreread less

Abstract: We present a hierarchical classification model that allows rare objects to borrow statistical strength from related objects that have many training examples. Unlike many of the existing object detection and recognition systems that treat different classes as unrelated entities, our model learns both a hierarchy for sharing visual appearance across 200 object categories and hierarchical parameters. Our experimental results on the challenging object localization and detection task demonstrate that the proposed model substantially improves the accuracy of the standard single object detectors that ignore hierarchical structure altogether.

...read moreread less

385 citations

Proceedings Article•

Transfer Learning by Borrowing Examples for Multiclass Object Detection

[...]

Joseph J. Lim¹, Ruslan Salakhutdinov², Antonio Torralba¹•Institutions (2)

Massachusetts Institute of Technology¹, University of Toronto²

12 Dec 2011

TL;DR: This work proposes a novel way of augmenting the training data for each class by borrowing and transforming examples from other classes, and demonstrates that the new object detector improves upon the current state-of-the-art detector on the challenging SUN09 object detection dataset.

...read moreread less

Abstract: Despite the recent trend of increasingly large datasets for object detection, there still exist many classes with few training examples. To overcome this lack of training data for certain classes, we propose a novel way of augmenting the training data for each class by borrowing and transforming examples from other classes. Our model learns which training instances from other classes to borrow and how to transform the borrowed examples so that they become more similar to instances from the target class. Our experimental results demonstrate that our new object detector, with borrowed and transformed examples, improves upon the current state-of-the-art detector on the challenging SUN09 object detection dataset.

...read moreread less

143 citations

Journal Article•DOI•

Discovering binary codes for documents by learning deep generative models.

[...]

Geoffrey E. Hinton¹, Ruslan Salakhutdinov²•Institutions (2)

University of Toronto¹, Massachusetts Institute of Technology²

01 Jan 2011-Topics in Cognitive Science

TL;DR: A deep generative model in which the lowest layer represents the word-count vector of a document and the top layer represents a learned binary code for that document is described, which allows more accurate and much faster retrieval than latent semantic analysis.

...read moreread less

Abstract: We describe a deep generative model in which the lowest layer represents the word-count vector of a document and the top layer represents a learned binary code for that document. The top two layers of the generative model form an undirected associative memory and the remaining layers form a belief net with directed, top-down connections. We present efficient learning and inference procedures for this type of generative model and show that it allows more accurate and much faster retrieval than latent semantic analysis. By using our method as a filter for a much slower method called TF-IDF we achieve higher accuracy than TF-IDF alone and save several orders of magnitude in retrieval time. By using short binary codes as addresses, we can perform retrieval on very large document sets in a time that is independent of the size of the document set using only one word of memory to describe each document.

...read moreread less

119 citations

One-shot learning with a hierarchical nonparametric Bayesian model

[...]

Ruslan Salakhutdinov¹, Josh Tenenbaum², Antonio Torralba²•Institutions (2)

University of Toronto¹, Massachusetts Institute of Technology²

02 Jul 2011

TL;DR: In this article, a hierarchical Bayesian model is proposed to transfer knowledge from previously learned categories to a novel category, in the form of a prior over category means and variances, which can discover how to group categories into meaningful super-categories that express different priors for new classes.

...read moreread less

Abstract: We develop a hierarchical Bayesian model that learns categories from single training examples. The model transfers acquired knowledge from previously learned categories to a novel category, in the form of a prior over category means and variances. The model discovers how to group categories into meaningful super-categories that express different priors for new classes. Given a single example of a novel category, we can efficiently infer which super-category the novel category belongs to, and thereby estimate not only the new category's mean but also an appropriate similarity metric based on parameters inherited from the super-category. On MNIST and MSR Cambridge image datasets the model learns useful representations of novel categories based on just a single training example, and performs significantly better than simpler hierarchical Bayesian approaches. It can also discover new categories in a completely unsupervised fashion, given just one or a few examples.

...read moreread less

111 citations

Proceedings Article•

Learning with the weighted trace-norm under arbitrary sampling distributions

[...]

Rina Foygel¹, Ohad Shamir², Nati Srebro³, Ruslan Salakhutdinov⁴•Institutions (4)

University of Chicago¹, Microsoft², Toyota Technological Institute at Chicago³, University of Toronto⁴

12 Dec 2011

TL;DR: In this article, the authors provide rigorous guarantees on learning with the weighted trace-norm under arbitrary sampling distributions and show that the standard weighted-trace norm might fail when the sampling distribution is not a product distribution.

...read moreread less

Abstract: We provide rigorous guarantees on learning with the weighted trace-norm under arbitrary sampling distributions. We show that the standard weighted-trace norm might fail when the sampling distribution is not a product distribution (i.e. when row and column indexes are not selected independently), present a corrected variant for which we establish strong learning guarantees, and demonstrate that it works better in practice. We provide guarantees when weighting by either the true or empirical sampling distribution, and suggest that even if the true distribution is known (or is uniform), weighting by the empirical distribution may be beneficial.

...read moreread less

52 citations

Posted Content•

Learning with the Weighted Trace-norm under Arbitrary Sampling Distributions

[...]

Rina Foygel, Ruslan Salakhutdinov, Ohad Shamir, Nathan Srebro

21 Jun 2011-arXiv: Learning

TL;DR: The standard weighted-trace norm might fail when the sampling distribution is not a product distribution, and a corrected variant is presented for which strong learning guarantees are established and it is suggested that even if the true distribution is known (or is uniform), weighting by the empirical distribution may be beneficial.

...read moreread less

Abstract: We provide rigorous guarantees on learning with the weighted trace-norm under arbitrary sampling distributions. We show that the standard weighted trace-norm might fail when the sampling distribution is not a product distribution (i.e. when row and column indexes are not selected independently), present a corrected variant for which we establish strong learning guarantees, and demonstrate that it works better in practice. We provide guarantees when weighting by either the true or empirical sampling distribution, and suggest that even if the true distribution is known (or is uniform), weighting by the empirical distribution may be beneficial.

...read moreread less

42 citations

Proceedings Article•

Learning to Learn with Compound HD Models

[...]

Antonio Torralba¹, Joshua B. Tenenbaum¹, Ruslan Salakhutdinov²•Institutions (2)

Massachusetts Institute of Technology¹, University of Toronto²

12 Dec 2011

TL;DR: Efficient learning and inference algorithms for the HDP-DBM model are presented and it is shown that it is able to learn new concepts from very few examples on CIFAR-100 object recognition, handwritten character recognition, and human motion capture datasets.

...read moreread less

Abstract: We introduce HD (or "Hierarchical-Deep") models, a new compositional learning architecture that integrates deep learning models with structured hierarchical Bayesian models. Specifically we show how we can learn a hierarchical Dirichlet process (HDP) prior over the activities of the top-level features in a Deep Boltzmann Machine (DBM). This compound HDP-DBM model learns to learn novel concepts from very few training examples, by learning low-level generic features, high-level features that capture correlations among low-level features, and a category hierarchy for sharing priors over the high-level features that are typical of different kinds of concepts. We present efficient learning and inference algorithms for the HDP-DBM model and show that it is able to learn new concepts from very few examples on CIFAR-100 object recognition, handwritten character recognition, and human motion capture datasets.

...read moreread less

33 citations

Posted Content•

Domain Adaptation: Overfitting and Small Sample Statistics

[...]

Dean P. Foster, Sham M. Kakade, Ruslan Salakhutdinov

04 May 2011-arXiv: Learning

TL;DR: The theoretical analysis shows that the greedy feature selection algorithm based on using T-statistics can select many more features than domains while avoiding overfitting by utilizing data-dependent variance properties.

...read moreread less

Abstract: We study the prevalent problem when a test distribution differs from the training distribution We consider a setting where our training set consists of a small number of sample domains, but where we have many samples in each domain Our goal is to generalize to a new domain For example, we may want to learn a similarity function using only certain classes of objects, but we desire that this similarity function be applicable to object classes not present in our training sample (eg we might seek to learn that "dogs are similar to dogs" even though images of dogs were absent from our training set) Our theoretical analysis shows that we can select many more features than domains while avoiding overfitting by utilizing data-dependent variance properties We present a greedy feature selection algorithm based on using T-statistics Our experiments validate this theory showing that our T-statistic based greedy feature selection is more robust at avoiding overfitting than the classical greedy procedure

...read moreread less

1 citations