scispace - formally typeset
Search or ask a question

Showing papers by "Ruslan Salakhutdinov published in 2011"


Journal Article
TL;DR: A generative model of how characters are composed from strokes is introduced, where knowledge from previous characters helps to infer the latent strokes in novel characters, using a massive new dataset of handwritten characters.

757 citations


Proceedings ArticleDOI
20 Jun 2011
TL;DR: A hierarchical classification model that allows rare objects to borrow statistical strength from related objects that have many training examples and learns both a hierarchy for sharing visual appearance across 200 object categories and hierarchical parameters is presented.
Abstract: We present a hierarchical classification model that allows rare objects to borrow statistical strength from related objects that have many training examples. Unlike many of the existing object detection and recognition systems that treat different classes as unrelated entities, our model learns both a hierarchy for sharing visual appearance across 200 object categories and hierarchical parameters. Our experimental results on the challenging object localization and detection task demonstrate that the proposed model substantially improves the accuracy of the standard single object detectors that ignore hierarchical structure altogether.

385 citations


Proceedings Article
12 Dec 2011
TL;DR: This work proposes a novel way of augmenting the training data for each class by borrowing and transforming examples from other classes, and demonstrates that the new object detector improves upon the current state-of-the-art detector on the challenging SUN09 object detection dataset.
Abstract: Despite the recent trend of increasingly large datasets for object detection, there still exist many classes with few training examples. To overcome this lack of training data for certain classes, we propose a novel way of augmenting the training data for each class by borrowing and transforming examples from other classes. Our model learns which training instances from other classes to borrow and how to transform the borrowed examples so that they become more similar to instances from the target class. Our experimental results demonstrate that our new object detector, with borrowed and transformed examples, improves upon the current state-of-the-art detector on the challenging SUN09 object detection dataset.

143 citations


Journal ArticleDOI
TL;DR: A deep generative model in which the lowest layer represents the word-count vector of a document and the top layer represents a learned binary code for that document is described, which allows more accurate and much faster retrieval than latent semantic analysis.
Abstract: We describe a deep generative model in which the lowest layer represents the word-count vector of a document and the top layer represents a learned binary code for that document. The top two layers of the generative model form an undirected associative memory and the remaining layers form a belief net with directed, top-down connections. We present efficient learning and inference procedures for this type of generative model and show that it allows more accurate and much faster retrieval than latent semantic analysis. By using our method as a filter for a much slower method called TF-IDF we achieve higher accuracy than TF-IDF alone and save several orders of magnitude in retrieval time. By using short binary codes as addresses, we can perform retrieval on very large document sets in a time that is independent of the size of the document set using only one word of memory to describe each document.

119 citations


02 Jul 2011
TL;DR: In this article, a hierarchical Bayesian model is proposed to transfer knowledge from previously learned categories to a novel category, in the form of a prior over category means and variances, which can discover how to group categories into meaningful super-categories that express different priors for new classes.
Abstract: We develop a hierarchical Bayesian model that learns categories from single training examples. The model transfers acquired knowledge from previously learned categories to a novel category, in the form of a prior over category means and variances. The model discovers how to group categories into meaningful super-categories that express different priors for new classes. Given a single example of a novel category, we can efficiently infer which super-category the novel category belongs to, and thereby estimate not only the new category's mean but also an appropriate similarity metric based on parameters inherited from the super-category. On MNIST and MSR Cambridge image datasets the model learns useful representations of novel categories based on just a single training example, and performs significantly better than simpler hierarchical Bayesian approaches. It can also discover new categories in a completely unsupervised fashion, given just one or a few examples.

111 citations


Proceedings Article
12 Dec 2011
TL;DR: In this article, the authors provide rigorous guarantees on learning with the weighted trace-norm under arbitrary sampling distributions and show that the standard weighted-trace norm might fail when the sampling distribution is not a product distribution.
Abstract: We provide rigorous guarantees on learning with the weighted trace-norm under arbitrary sampling distributions. We show that the standard weighted-trace norm might fail when the sampling distribution is not a product distribution (i.e. when row and column indexes are not selected independently), present a corrected variant for which we establish strong learning guarantees, and demonstrate that it works better in practice. We provide guarantees when weighting by either the true or empirical sampling distribution, and suggest that even if the true distribution is known (or is uniform), weighting by the empirical distribution may be beneficial.

52 citations


Posted Content
TL;DR: The standard weighted-trace norm might fail when the sampling distribution is not a product distribution, and a corrected variant is presented for which strong learning guarantees are established and it is suggested that even if the true distribution is known (or is uniform), weighting by the empirical distribution may be beneficial.
Abstract: We provide rigorous guarantees on learning with the weighted trace-norm under arbitrary sampling distributions. We show that the standard weighted trace-norm might fail when the sampling distribution is not a product distribution (i.e. when row and column indexes are not selected independently), present a corrected variant for which we establish strong learning guarantees, and demonstrate that it works better in practice. We provide guarantees when weighting by either the true or empirical sampling distribution, and suggest that even if the true distribution is known (or is uniform), weighting by the empirical distribution may be beneficial.

42 citations


Proceedings Article
12 Dec 2011
TL;DR: Efficient learning and inference algorithms for the HDP-DBM model are presented and it is shown that it is able to learn new concepts from very few examples on CIFAR-100 object recognition, handwritten character recognition, and human motion capture datasets.
Abstract: We introduce HD (or "Hierarchical-Deep") models, a new compositional learning architecture that integrates deep learning models with structured hierarchical Bayesian models. Specifically we show how we can learn a hierarchical Dirichlet process (HDP) prior over the activities of the top-level features in a Deep Boltzmann Machine (DBM). This compound HDP-DBM model learns to learn novel concepts from very few training examples, by learning low-level generic features, high-level features that capture correlations among low-level features, and a category hierarchy for sharing priors over the high-level features that are typical of different kinds of concepts. We present efficient learning and inference algorithms for the HDP-DBM model and show that it is able to learn new concepts from very few examples on CIFAR-100 object recognition, handwritten character recognition, and human motion capture datasets.

33 citations


Posted Content
TL;DR: The theoretical analysis shows that the greedy feature selection algorithm based on using T-statistics can select many more features than domains while avoiding overfitting by utilizing data-dependent variance properties.
Abstract: We study the prevalent problem when a test distribution differs from the training distribution We consider a setting where our training set consists of a small number of sample domains, but where we have many samples in each domain Our goal is to generalize to a new domain For example, we may want to learn a similarity function using only certain classes of objects, but we desire that this similarity function be applicable to object classes not present in our training sample (eg we might seek to learn that "dogs are similar to dogs" even though images of dogs were absent from our training set) Our theoretical analysis shows that we can select many more features than domains while avoiding overfitting by utilizing data-dependent variance properties We present a greedy feature selection algorithm based on using T-statistics Our experiments validate this theory showing that our T-statistic based greedy feature selection is more robust at avoiding overfitting than the classical greedy procedure

1 citations