Showing papers on "Unsupervised learning published in 2010"

PDF

Open Access

Journal Article•DOI•

[...]

Sinno Jialin Pan¹, Qiang Yang¹•Institutions (1)

Hong Kong University of Science and Technology¹

01 Oct 2010-IEEE Transactions on Knowledge and Data Engineering

TL;DR: The relationship between transfer learning and other related machine learning techniques such as domain adaptation, multitask learning and sample selection bias, as well as covariate shift are discussed.

...read moreread less

Abstract: A major assumption in many machine learning and data mining algorithms is that the training and future data must be in the same feature space and have the same distribution. However, in many real-world applications, this assumption may not hold. For example, we sometimes have a classification task in one domain of interest, but we only have sufficient training data in another domain of interest, where the latter data may be in a different feature space or follow a different data distribution. In such cases, knowledge transfer, if done successfully, would greatly improve the performance of learning by avoiding much expensive data-labeling efforts. In recent years, transfer learning has emerged as a new learning framework to address this problem. This survey focuses on categorizing and reviewing the current progress on transfer learning for classification, regression, and clustering problems. In this survey, we discuss the relationship between transfer learning and other related machine learning techniques such as domain adaptation, multitask learning and sample selection bias, as well as covariate shift. We also explore some potential future issues in transfer learning research.

...read moreread less

18,616 citations

Journal Article•DOI•

Data clustering: 50 years beyond K-means

[...]

Anil K. Jain¹•Institutions (1)

Michigan State University¹

01 Jun 2010

TL;DR: A brief overview of clustering is provided, well known clustering methods are summarized, the major challenges and key issues in designing clustering algorithms are discussed, and some of the emerging and useful research directions are pointed out.

...read moreread less

Abstract: Organizing data into sensible groupings is one of the most fundamental modes of understanding and learning. As an example, a common scheme of scientific classification puts organisms into a system of ranked taxa: domain, kingdom, phylum, class, etc. Cluster analysis is the formal study of methods and algorithms for grouping, or clustering, objects according to measured or perceived intrinsic characteristics or similarity. Cluster analysis does not use category labels that tag objects with prior identifiers, i.e., class labels. The absence of category information distinguishes data clustering (unsupervised learning) from classification or discriminant analysis (supervised learning). The aim of clustering is to find structure in data and is therefore exploratory in nature. Clustering has a long and rich history in a variety of scientific fields. One of the most popular and simple clustering algorithms, K-means, was first published in 1955. In spite of the fact that K-means was proposed over 50 years ago and thousands of clustering algorithms have been published since then, K-means is still widely used. This speaks to the difficulty in designing a general purpose clustering algorithm and the ill-posed problem of clustering. We provide a brief overview of clustering, summarize well known clustering methods, discuss the major challenges and key issues in designing clustering algorithms, and point out some of the emerging and useful research directions, including semi-supervised clustering, ensemble clustering, simultaneous feature selection during data clustering, and large scale data clustering.

...read moreread less

6,601 citations

Book•DOI•

Semi-Supervised Learning

[...]

Olivier Chapelle¹, Bernhard Schlkopf¹, Alexander Zien¹•Institutions (1)

Max Planck Society¹

31 Mar 2010

TL;DR: Semi-supervised learning (SSL) as discussed by the authors is the middle ground between supervised learning (in which all training examples are labeled) and unsupervised training (where no label data are given).

...read moreread less

Abstract: In the field of machine learning, semi-supervised learning (SSL) occupies the middle ground, between supervised learning (in which all training examples are labeled) and unsupervised learning (in which no label data are given). Interest in SSL has increased in recent years, particularly because of application domains in which unlabeled data are plentiful, such as images, text, and bioinformatics. This first comprehensive overview of SSL presents state-of-the-art algorithms, a taxonomy of the field, selected applications, benchmark experiments, and perspectives on ongoing and future research. Semi-Supervised Learning first presents the key assumptions and ideas underlying the field: smoothness, cluster or low-density separation, manifold structure, and transduction. The core of the book is the presentation of SSL methods, organized according to algorithmic strategies. After an examination of generative models, the book describes algorithms that implement the low-density separation assumption, graph-based methods, and algorithms that perform two-step learning. The book then discusses SSL applications and offers guidelines for SSL practitioners by analyzing the results of extensive benchmark experiments. Finally, the book looks at interesting directions for SSL research. The book closes with a discussion of the relationship between semi-supervised learning and transduction. Adaptive Computation and Machine Learning series

...read moreread less

3,773 citations

Journal Article•DOI•

The Elements of Statistical Learning: Data Mining, Inference, and Prediction

[...]

Z. Q. John Lu¹•Institutions (1)

National Institute of Standards and Technology¹

01 Jul 2010-Journal of The Royal Statistical Society Series A-statistics in Society

TL;DR: This section will review those books whose content and level reflect the general editorial poltcy of Technometrics.

...read moreread less

Abstract: This section will review those books whose content and level reflect the general editorial poltcy of Technometrics. Publishers should send books for review to Ejaz Ahmed, Depatment of Mathematics and Statistics, University of Windsor, Windsor, ON N9B 3P4 (techeditor@uwindsoxca). The opinions expressed in this section are those of the reviewers These opinions do not represent positions of the reviewer's organization and may not reflect those of the editors or the sponsoring societies. Listed prices reflect information provided by the publisher and may not be current The book purchase programs of the American Society for Quality can provide some of these books at reduced prices for members. For infbrmation, contact the American Society for Quality at 1-800-248-1946.

...read moreread less

2,342 citations

Journal Article•

Why Does Unsupervised Pre-training Help Deep Learning?

[...]

Dumitru Erhan¹, Yoshua Bengio¹, Aaron Courville¹, Pierre-Antoine Manzagol¹, Pascal Vincent¹, Samy Bengio² - Show less +2 more•Institutions (2)

Université de Montréal¹, Google²

01 Mar 2010-Journal of Machine Learning Research

TL;DR: In this paper, the authors empirically show the influence of pre-training with respect to architecture depth, model capacity, and number of training examples, and they suggest that unsupervised pretraining guides the learning towards basins of attraction of minima that support better generalization.

...read moreread less

Abstract: Much recent research has been devoted to learning algorithms for deep architectures such as Deep Belief Networks and stacks of auto-encoder variants, with impressive results obtained in several areas, mostly on vision and language data sets. The best results obtained on supervised learning tasks involve an unsupervised learning component, usually in an unsupervised pre-training phase. Even though these new algorithms have enabled training deep models, many questions remain as to the nature of this difficult learning problem. The main question investigated here is the following: how does unsupervised pre-training work? Answering this questions is important if learning in deep architectures is to be further improved. We propose several explanatory hypotheses and test them through extensive simulations. We empirically show the influence of pre-training with respect to architecture depth, model capacity, and number of training examples. The experiments confirm and clarify the advantage of unsupervised pre-training. The results suggest that unsupervised pre-training guides the learning towards basins of attraction of minima that support better generalization from the training data set; the evidence from these results supports a regularization explanation for the effect of pre-training.

...read moreread less

2,036 citations

Proceedings Article•DOI•

Convolutional networks and applications in vision

[...]

Yann LeCun¹, Koray Kavukcuoglu¹, Clement Farabet¹•Institutions (1)

Courant Institute of Mathematical Sciences¹

03 Aug 2010

TL;DR: New unsupervised learning algorithms, and new non-linear stages that allow ConvNets to be trained with very few labeled samples are described, including one for visual object recognition and vision navigation for off-road mobile robots.

...read moreread less

Abstract: Intelligent tasks, such as visual perception, auditory perception, and language understanding require the construction of good internal representations of the world (or "features")? which must be invariant to irrelevant variations of the input while, preserving relevant information. A major question for Machine Learning is how to learn such good features automatically. Convolutional Networks (ConvNets) are a biologically-inspired trainable architecture that can learn invariant features. Each stage in a ConvNets is composed of a filter bank, some nonlinearities, and feature pooling layers. With multiple stages, a ConvNet can learn multi-level hierarchies of features. While ConvNets have been successfully deployed in many commercial applications from OCR to video surveillance, they require large amounts of labeled training samples. We describe new unsupervised learning algorithms, and new non-linear stages that allow ConvNets to be trained with very few labeled samples. Applications to visual object recognition and vision navigation for off-road mobile robots are described.

...read moreread less

1,927 citations

Proceedings Article•

Self-Paced Learning for Latent Variable Models

[...]

M. P. Kumar¹, Benjamin Packer¹, Daphne Koller¹•Institutions (1)

Stanford University¹

06 Dec 2010

TL;DR: A novel, iterative self-paced learning algorithm where each iteration simultaneously selects easy samples and learns a new parameter vector that outperforms the state of the art method for learning a latent structural SVM on four applications.

...read moreread less

Abstract: Latent variable models are a powerful tool for addressing several tasks in machine learning. However, the algorithms for learning the parameters of latent variable models are prone to getting stuck in a bad local optimum. To alleviate this problem, we build on the intuition that, rather than considering all samples simultaneously, the algorithm should be presented with the training data in a meaningful order that facilitates learning. The order of the samples is determined by how easy they are. The main challenge is that often we are not provided with a readily computable measure of the easiness of samples. We address this issue by proposing a novel, iterative self-paced learning algorithm where each iteration simultaneously selects easy samples and learns a new parameter vector. The number of samples selected is governed by a weight that is annealed until the entire training data has been considered. We empirically demonstrate that the self-paced learning algorithm outperforms the state of the art method for learning a latent structural SVM on four applications: object localization, noun phrase coreference, motif finding and handwritten digit recognition.

...read moreread less

1,220 citations

Posted Content•

A Reduction of Imitation Learning and Structured Prediction to No-Regret Online Learning

[...]

Stephane Ross¹, Geoffrey J. Gordon¹, J. Andrew Bagnell¹•Institutions (1)

Carnegie Mellon University¹

02 Nov 2010-arXiv: Learning

TL;DR: In this article, a no-regret algorithm is proposed to train a stationary deterministic policy with good performance under the distribution of observations it induces in such sequential settings, and it outperforms previous approaches on two challenging imitation learning problems and a benchmark sequence labeling problem.

...read moreread less

Abstract: Sequential prediction problems such as imitation learning, where future observations depend on previous predictions (actions), violate the common i.i.d. assumptions made in statistical learning. This leads to poor performance in theory and often in practice. Some recent approaches provide stronger guarantees in this setting, but remain somewhat unsatisfactory as they train either non-stationary or stochastic policies and require a large number of iterations. In this paper, we propose a new iterative algorithm, which trains a stationary deterministic policy, that can be seen as a no regret algorithm in an online learning setting. We show that any such no regret algorithm, combined with additional reduction assumptions, must find a policy with good performance under the distribution of observations it induces in such sequential settings. We demonstrate that this new approach outperforms previous approaches on two challenging imitation learning problems and a benchmark sequence labeling problem.

...read moreread less

1,176 citations

Proceedings Article•DOI•

P-N learning: Bootstrapping binary classifiers by structural constraints

[...]

Zdenek Kalal¹, Jiri Matas², Krystian Mikolajczyk¹•Institutions (2)

University of Surrey¹, Czech Technical University in Prague²

13 Jun 2010

TL;DR: It is shown that the performance of a binary classifier can be significantly improved by the processing of structured unlabeled data, and a theory that formulates the conditions under which P-N learning guarantees improvement of the initial classifier is proposed and validated on synthetic and real data.

...read moreread less

Abstract: This paper shows that the performance of a binary classifier can be significantly improved by the processing of structured unlabeled data, i.e. data are structured if knowing the label of one example restricts the labeling of the others. We propose a novel paradigm for training a binary classifier from labeled and unlabeled examples that we call P-N learning. The learning process is guided by positive (P) and negative (N) constraints which restrict the labeling of the unlabeled set. P-N learning evaluates the classifier on the unlabeled data, identifies examples that have been classified in contradiction with structural constraints and augments the training set with the corrected samples in an iterative process. We propose a theory that formulates the conditions under which P-N learning guarantees improvement of the initial classifier and validate it on synthetic and real data. P-N learning is applied to the problem of on-line learning of object detector during tracking. We show that an accurate object detector can be learned from a single example and an unlabeled video sequence where the object may occur. The algorithm is compared with related approaches and state-of-the-art is achieved on a variety of objects (faces, pedestrians, cars, motorbikes and animals).

...read moreread less

1,165 citations

Book•

Algorithms for Reinforcement Learning

[...]

Csaba Szepesvári¹•Institutions (1)

University of Alberta¹

25 Jun 2010

TL;DR: This book focuses on those algorithms of reinforcement learning that build on the powerful theory of dynamic programming, and gives a fairly comprehensive catalog of learning problems, and describes the core ideas, followed by the discussion of their theoretical properties and limitations.

...read moreread less

Abstract: Reinforcement learning is a learning paradigm concerned with learning to control a system so as to maximize a numerical performance measure that expresses a long-term objective.What distinguishes reinforcement learning from supervised learning is that only partial feedback is given to the learner about the learner's predictions. Further, the predictions may have long term effects through influencing the future state of the controlled system. Thus, time plays a special role. The goal in reinforcement learning is to develop efficient learning algorithms, as well as to understand the algorithms' merits and limitations. Reinforcement learning is of great interest because of the large number of practical applications that it can be used to address, ranging from problems in artificial intelligence to operations research or control engineering. In this book, we focus on those algorithms of reinforcement learning that build on the powerful theory of dynamic programming.We give a fairly comprehensive catalog of learning problems, describe the core ideas, note a large number of state of the art algorithms, followed by the discussion of their theoretical properties and limitations.

...read moreread less

1,146 citations

Proceedings Article•DOI•

Unsupervised feature selection for multi-cluster data

[...]

Deng Cai¹, Chiyuan Zhang¹, Xiaofei He¹•Institutions (1)

Zhejiang University¹

25 Jul 2010

TL;DR: Inspired from the recent developments on manifold learning and L1-regularized models for subset selection, a new approach is proposed, called Multi-Cluster Feature Selection (MCFS), for unsupervised feature selection, which select those features such that the multi-cluster structure of the data can be best preserved.

...read moreread less

Abstract: In many data analysis tasks, one is often confronted with very high dimensional data. Feature selection techniques are designed to find the relevant feature subset of the original features which can facilitate clustering, classification and retrieval. In this paper, we consider the feature selection problem in unsupervised learning scenario, which is particularly difficult due to the absence of class labels that would guide the search for relevant information. The feature selection problem is essentially a combinatorial optimization problem which is computationally expensive. Traditional unsupervised feature selection methods address this issue by selecting the top ranked features based on certain scores computed independently for each feature. These approaches neglect the possible correlation between different features and thus can not produce an optimal feature subset. Inspired from the recent developments on manifold learning and L1-regularized models for subset selection, we propose in this paper a new approach, called Multi-Cluster Feature Selection (MCFS), for unsupervised feature selection. Specifically, we select those features such that the multi-cluster structure of the data can be best preserved. The corresponding optimization problem can be efficiently solved since it only involves a sparse eigen-problem and a L1-regularized least squares problem. Extensive experimental results over various real-life data sets have demonstrated the superiority of the proposed algorithm.

...read moreread less

Book Chapter•DOI•

Convolutional learning of spatio-temporal features

[...]

Graham W. Taylor¹, Rob Fergus¹, Yann LeCun¹, Christoph Bregler¹•Institutions (1)

Courant Institute of Mathematical Sciences¹

05 Sep 2010

TL;DR: A model that learns latent representations of image sequences from pairs of successive images is introduced, allowing it to scale to realistic image sizes whilst using a compact parametrization.

...read moreread less

Abstract: We address the problem of learning good features for understanding video data. We introduce a model that learns latent representations of image sequences from pairs of successive images. The convolutional architecture of our model allows it to scale to realistic image sizes whilst using a compact parametrization. In experiments on the NORB dataset, we show our model extracts latent "flow fields" which correspond to the transformation between the pair of input frames. We also use our model to extract low-level motion features in a multi-stage architecture for action recognition, demonstrating competitive performance on both the KTH and Hollywood2 datasets.

...read moreread less

Journal Article•DOI•

Domain Adaptation Problems: A DASVM Classification Technique and a Circular Validation Strategy

[...]

Lorenzo Bruzzone¹, Mattia Marconcini¹•Institutions (1)

University of Trento¹

01 May 2010-IEEE Transactions on Pattern Analysis and Machine Intelligence

TL;DR: Experimental results confirmed the effectiveness and the reliability of both the DASVM technique and the proposed circular validation strategy for validating the learning of domain adaptation classifiers when no true labels for the target--domain instances are available.

...read moreread less

Abstract: This paper addresses pattern classification in the framework of domain adaptation by considering methods that solve problems in which training data are assumed to be available only for a source domain different (even if related) from the target domain of (unlabeled) test data. Two main novel contributions are proposed: 1) a domain adaptation support vector machine (DASVM) technique which extends the formulation of support vector machines (SVMs) to the domain adaptation framework and 2) a circular indirect accuracy assessment strategy for validating the learning of domain adaptation classifiers when no true labels for the target--domain instances are available. Experimental results, obtained on a series of two-dimensional toy problems and on two real data sets related to brain computer interface and remote sensing applications, confirmed the effectiveness and the reliability of both the DASVM technique and the proposed circular validation strategy.

...read moreread less

Book Chapter•DOI•

Learning to detect roads in high-resolution aerial images

[...]

Volodymyr Mnih¹, Geoffrey E. Hinton¹•Institutions (1)

University of Toronto¹

05 Sep 2010

TL;DR: This work proposes detecting roads using a neural network with millions of trainable weights which looks at a much larger context than was used in previous attempts at learning the task, and shows that the method works reliably on two challenging urban datasets that are an order of magnitude larger than what was used to evaluate previous approaches.

...read moreread less

Abstract: Reliably extracting information from aerial imagery is a difficult problem with many practical applications. One specific case of this problem is the task of automatically detecting roads. This task is a difficult vision problem because of occlusions, shadows, and a wide variety of non-road objects. Despite 30 years of work on automatic road detection, no automatic or semi-automatic road detection system is currently on the market and no published method has been shown to work reliably on large datasets of urban imagery. We propose detecting roads using a neural network with millions of trainable weights which looks at a much larger context than was used in previous attempts at learning the task. The network is trained on massive amounts of data using a consumer GPU. We demonstrate that predictive performance can be substantially improved by initializing the feature detectors using recently developed unsupervised learning methods as well as by taking advantage of the local spatial coherence of the output labels.We show that our method works reliably on two challenging urban datasets that are an order of magnitude larger than what was used to evaluate previous approaches.

...read moreread less

Journal Article•

Hubs in Space: Popular Nearest Neighbors in High-Dimensional Data

[...]

Miloš Radovanović, Alexandros Nanopoulos, Mirjana Ivanović

01 Mar 2010-Journal of Machine Learning Research

TL;DR: This paper explores a new aspect of the dimensionality curse, referred to as hubness, that affects the distribution of k-occurrences: the number of times a point appears among the k nearest neighbors of other points in a data set, which becomes considerably skewed as dimensionality increases.

...read moreread less

Abstract: Different aspects of the curse of dimensionality are known to present serious challenges to various machine-learning methods and tasks. This paper explores a new aspect of the dimensionality curse, referred to as hubness, that affects the distribution of k-occurrences: the number of times a point appears among the k nearest neighbors of other points in a data set. Through theoretical and empirical analysis involving synthetic and real data sets we show that under commonly used assumptions this distribution becomes considerably skewed as dimensionality increases, causing the emergence of hubs, that is, points with very high k-occurrences which effectively represent "popular" nearest neighbors. We examine the origins of this phenomenon, showing that it is an inherent property of data distributions in high-dimensional vector space, discuss its interaction with dimensionality reduction, and explore its influence on a wide range of machine-learning tasks directly or indirectly based on measuring distances, belonging to supervised, semi-supervised, and unsupervised learning families.

...read moreread less

Journal Article•DOI•

A Survey on the Application of Genetic Programming to Classification

[...]

P.G. Espejo¹, Sebastián Ventura¹, Francisco Herrera²•Institutions (2)

University of Córdoba (Spain)¹, University of Granada²

01 Mar 2010

TL;DR: This paper surveys existing literature about the application of genetic programming to classification, to show the different ways in which this evolutionary algorithm can help in the construction of accurate and reliable classifiers.

...read moreread less

Abstract: Classification is one of the most researched questions in machine learning and data mining. A wide range of real problems have been stated as classification problems, for example credit scoring, bankruptcy prediction, medical diagnosis, pattern recognition, text categorization, software quality assessment, and many more. The use of evolutionary algorithms for training classifiers has been studied in the past few decades. Genetic programming (GP) is a flexible and powerful evolutionary technique with some features that can be very valuable and suitable for the evolution of classifiers. This paper surveys existing literature about the application of genetic programming to classification, to show the different ways in which this evolutionary algorithm can help in the construction of accurate and reliable classifiers.

...read moreread less

Proceedings Article•DOI•

Face recognition with learning-based descriptor

[...]

Zhimin Cao¹, Qi Yin², Xiaoou Tang¹, Jian Sun³•Institutions (3)

The Chinese University of Hong Kong¹, Tsinghua University², Microsoft³

13 Jun 2010

TL;DR: This work proposes a pose-adaptive matching method that uses pose-specific classifiers to deal with different pose combinations of the matching face pair, and finds that a simple normalization mechanism after PCA can further improve the discriminative ability of the descriptor.

...read moreread less

Abstract: We present a novel approach to address the representation issue and the matching issue in face recognition (verification). Firstly, our approach encodes the micro-structures of the face by a new learning-based encoding method. Unlike many previous manually designed encoding methods (e.g., LBP or SIFT), we use unsupervised learning techniques to learn an encoder from the training examples, which can automatically achieve very good tradeoff between discriminative power and invariance. Then we apply PCA to get a compact face descriptor. We find that a simple normalization mechanism after PCA can further improve the discriminative ability of the descriptor. The resulting face representation, learning-based (LE) descriptor, is compact, highly discriminative, and easy-to-extract. To handle the large pose variation in real-life scenarios, we propose a pose-adaptive matching method that uses pose-specific classifiers to deal with different pose combinations (e.g., frontal v.s. frontal, frontal v.s. left) of the matching face pair. Our approach is comparable with the state-of-the-art methods on the Labeled Face in Wild (LFW) benchmark (we achieved 84.45% recognition rate), while maintaining excellent compactness, simplicity, and generalization ability across different datasets.

...read moreread less

Proceedings Article•DOI•

Coupled semi-supervised learning for information extraction

[...]

Andrew Carlson¹, Justin Betteridge¹, Richard C. Wang¹, Estevam R. Hruschka², Tom M. Mitchell¹ - Show less +1 more•Institutions (2)

Carnegie Mellon University¹, Federal University of São Carlos²

04 Feb 2010

TL;DR: This paper characterize several ways in which the training of category and relation extractors can be coupled, and presents experimental results demonstrating significantly improved accuracy as a result.

...read moreread less

Abstract: We consider the problem of semi-supervised learning to extract categories (e.g., academic fields, athletes) and relations (e.g., PlaysSport(athlete, sport)) from web pages, starting with a handful of labeled training examples of each category or relation, plus hundreds of millions of unlabeled web documents. Semi-supervised training using only a few labeled examples is typically unreliable because the learning task is underconstrained. This paper pursues the thesis that much greater accuracy can be achieved by further constraining the learning task, by coupling the semi-supervised training of many extractors for different categories and relations. We characterize several ways in which the training of category and relation extractors can be coupled, and present experimental results demonstrating significantly improved accuracy as a result.

...read moreread less

Journal Article•DOI•

Flexible Manifold Embedding: A Framework for Semi-Supervised and Unsupervised Dimension Reduction

[...]

Feiping Nie¹, Dong Xu¹, Ivor W. Tsang¹, Changshui Zhang²•Institutions (2)

Nanyang Technological University¹, Tsinghua University²

01 Jul 2010-IEEE Transactions on Image Processing

TL;DR: A unified manifold learning framework for semi-supervised and unsupervised dimension reduction by employing a simple but effective linear regression function to map the new data points by modeling the mismatch between h(X) and F.

...read moreread less

Abstract: We propose a unified manifold learning framework for semi-supervised and unsupervised dimension reduction by employing a simple but effective linear regression function to map the new data points. For semi-supervised dimension reduction, we aim to find the optimal prediction labels F for all the training samples X, the linear regression function h(X) and the regression residue F0 = F - h(X) simultaneously. Our new objective function integrates two terms related to label fitness and manifold smoothness as well as a flexible penalty term defined on the residue F0. Our Semi-Supervised learning framework, referred to as flexible manifold embedding (FME), can effectively utilize label information from labeled data as well as a manifold structure from both labeled and unlabeled data. By modeling the mismatch between h(X) and F, we show that FME relaxes the hard linear constraint F = h(X) in manifold regularization (MR), making it better cope with the data sampled from a nonlinear manifold. In addition, we propose a simplified version (referred to as FME/U) for unsupervised dimension reduction. We also show that our proposed framework provides a unified view to explain and understand many semi-supervised, supervised and unsupervised dimension reduction techniques. Comprehensive experiments on several benchmark databases demonstrate the significant improvement over existing dimension reduction algorithms.

...read moreread less

Proceedings Article•DOI•

PROST: Parallel robust online simple tracking

[...]

Jakob Santner¹, Christian Leistner¹, Amir Saffari¹, Thomas Pock¹, Horst Bischof¹ - Show less +1 more•Institutions (1)

Graz University of Technology¹

13 Jun 2010

TL;DR: This work shows that augmenting an on-line learning method with complementary tracking approaches can lead to more stable results, and uses a simple template model as a non-adaptive and thus stable component, a novel optical-flow-based mean-shift tracker as highly adaptive element and anon-line random forest as moderately adaptive appearance-based learner.

...read moreread less

Abstract: Tracking-by-detection is increasingly popular in order to tackle the visual tracking problem. Existing adaptive methods suffer from the drifting problem, since they rely on self-updates of an on-line learning method. In contrast to previous work that tackled this problem by employing semi-supervised or multiple-instance learning, we show that augmenting an on-line learning method with complementary tracking approaches can lead to more stable results. In particular, we use a simple template model as a non-adaptive and thus stable component, a novel optical-flow-based mean-shift tracker as highly adaptive element and an on-line random forest as moderately adaptive appearance-based learner. We combine these three trackers in a cascade. All of our components run on GPUs or similar multi-core systems, which allows for real-time performance. We show the superiority of our system over current state-of-the-art tracking methods in several experiments on publicly available data.

...read moreread less

Proceedings Article•DOI•

Deep auto-encoder neural networks in reinforcement learning

[...]

Sascha Lange¹, Martin Riedmiller¹•Institutions (1)

University of Freiburg¹

18 Jul 2010

TL;DR: A framework for combining the training of deep auto-encoders (for learning compact feature spaces) with recently-proposed batch-mode RL algorithms ( for learning policies) is proposed and an emphasis is put on the data-efficiency and on studying the properties of the feature spaces automatically constructed by the deep Auto-encoder neural networks.

...read moreread less

Abstract: This paper discusses the effectiveness of deep auto-encoder neural networks in visual reinforcement learning (RL) tasks. We propose a framework for combining the training of deep auto-encoders (for learning compact feature spaces) with recently-proposed batch-mode RL algorithms (for learning policies). An emphasis is put on the data-efficiency of this combination and on studying the properties of the feature spaces automatically constructed by the deep auto-encoders. These feature spaces are empirically shown to adequately resemble existing similarities and spatial relations between observations and allow to learn useful policies. We propose several methods for improving the topology of the feature spaces making use of task-dependent information. Finally, we present first results on successfully learning good control policies directly on synthesized and real images.

...read moreread less

Posted Content•

Self-Taught Hashing for Fast Similarity Search

[...]

Dell Zhang¹, Jun Wang², Deng Cai³, Jinsong Lu¹•Institutions (3)

Birkbeck, University of London¹, University College London², Zhejiang University³

29 Apr 2010-arXiv: Information Retrieval

TL;DR: Self-Taught Hashing (STH) as mentioned in this paper is a self-taught hashing approach that finds the optimal binary codes for all documents in the given corpus via unsupervised learning, and then trains classifiers via supervised learning to predict the $l$-bit code for any query document unseen before.

...read moreread less

Abstract: The ability of fast similarity search at large scale is of great importance to many Information Retrieval (IR) applications. A promising way to accelerate similarity search is semantic hashing which designs compact binary codes for a large number of documents so that semantically similar documents are mapped to similar codes (within a short Hamming distance). Although some recently proposed techniques are able to generate high-quality codes for documents known in advance, obtaining the codes for previously unseen documents remains to be a very challenging problem. In this paper, we emphasise this issue and propose a novel Self-Taught Hashing (STH) approach to semantic hashing: we first find the optimal $l$-bit binary codes for all documents in the given corpus via unsupervised learning, and then train $l$ classifiers via supervised learning to predict the $l$-bit code for any query document unseen before. Our experiments on three real-world text datasets show that the proposed approach using binarised Laplacian Eigenmap (LapEig) and linear Support Vector Machine (SVM) outperforms state-of-the-art techniques significantly.

...read moreread less

Journal Article•DOI•

Learning author-topic models from text corpora

[...]

Michal Rosen-Zvi¹, Chaitanya Chemudugunta², Thomas L. Griffiths³, Padhraic Smyth², Mark Steyvers² - Show less +1 more•Institutions (3)

IBM¹, University of California, Irvine², University of California, Berkeley³

29 Jan 2010-ACM Transactions on Information Systems

TL;DR: The interpretation of the results discovered by the system including specific topic and author models, ranking of authors by topic and topics by author, parsing of abstracts by topics and authors, and detection of unusual papers by specific authors are discussed.

...read moreread less

Abstract: We propose an unsupervised learning technique for extracting information about authors and topics from large text collections. We model documents as if they were generated by a two-stage stochastic process. An author is represented by a probability distribution over topics, and each topic is represented as a probability distribution over words. The probability distribution over topics in a multi-author paper is a mixture of the distributions associated with the authors. The topic-word and author-topic distributions are learned from data in an unsupervised manner using a Markov chain Monte Carlo algorithm. We apply the methodology to three large text corpora: 150,000 abstracts from the CiteSeer digital library, 1740 papers from the Neural Information Processing Systems (NIPS) Conferences, and 121,000 emails from the Enron corporation. We discuss in detail the interpretation of the results discovered by the system including specific topic and author models, ranking of authors by topic and topics by author, parsing of abstracts by topics and authors, and detection of unusual papers by specific authors. Experiments based on perplexity scores for test documents and precision-recall for document retrieval are used to illustrate systematic differences between the proposed author-topic model and a number of alternatives. Extensions to the model, allowing for example, generalizations of the notion of an author, are also briefly discussed.

...read moreread less

Proceedings Article•DOI•

Self-taught hashing for fast similarity search

[...]

Dell Zhang¹, Jun Wang², Deng Cai³, Jinsong Lu¹•Institutions (3)

Birkbeck, University of London¹, University College London², Zhejiang University³

19 Jul 2010

TL;DR: Self-Taught Hashing (STH) as discussed by the authors is a self-taught hashing method that finds the optimal l-bit binary codes for all documents in the given corpus via unsupervised learning, and then trains l classifiers via supervised learning to predict the lbit code for any query document unseen before.

...read moreread less

Abstract: The ability of fast similarity search at large scale is of great importance to many Information Retrieval (IR) applications. A promising way to accelerate similarity search is semantic hashing which designs compact binary codes for a large number of documents so that semantically similar documents are mapped to similar codes (within a short Hamming distance). Although some recently proposed techniques are able to generate high-quality codes for documents known in advance, obtaining the codes for previously unseen documents remains to be a very challenging problem. In this paper, we emphasise this issue and propose a novel Self-Taught Hashing (STH) approach to semantic hashing: we first find the optimal l-bit binary codes for all documents in the given corpus via unsupervised learning, and then train l classifiers via supervised learning to predict the l-bit code for any query document unseen before. Our experiments on three real-world text datasets show that the proposed approach using binarised Laplacian Eigenmap (LapEig) and linear Support Vector Machine (SVM) outperforms state-of-the-art techniques significantly.

...read moreread less

Journal Article•DOI•

Semisupervised Neural Networks for Efficient Hyperspectral Image Classification

[...]

Frédéric Ratle¹, Gustavo Camps-Valls, Jason Weston²•Institutions (2)

University of Lausanne¹, Princeton University²

17 Feb 2010-IEEE Transactions on Geoscience and Remote Sensing

TL;DR: The proposed approach gives rise to an operational classifier, as opposed to previously presented transductive or Laplacian support vector machines (TSVM or LapSVM, respectively), which constitutes a general framework for building computationally efficient semisupervised methods.

...read moreread less

Abstract: A framework for semisupervised remote sensing image classification based on neural networks is presented. The methodology consists of adding a flexible embedding regularizer to the loss function used for training neural networks. Training is done using stochastic gradient descent with additional balancing constraints to avoid falling into local minima. The method constitutes a generalization of both supervised and unsupervised methods and can handle millions of unlabeled samples. Therefore, the proposed approach gives rise to an operational classifier, as opposed to previously presented transductive or Laplacian support vector machines (TSVM or LapSVM, respectively). The proposed methodology constitutes a general framework for building computationally efficient semisupervised methods. The method is compared with LapSVM and TSVM in semisupervised scenarios, to SVM in supervised settings, and to online and batch k-means for unsupervised learning. Results demonstrate the improved classification accuracy and scalability of this approach on several hyperspectral image classification problems.

...read moreread less

Book Chapter•DOI•

Types of Machine Learning Algorithms

[...]

Taiwo Oladipupo Ayodele

01 Feb 2010

TL;DR: The supervised learning task is the classification problem: the learner is required to learn a function which maps a vector into one of several classes by looking at several input-output examples of the function.

...read moreread less

Abstract: • Supervised learning --where the algorithm generates a function that maps inputs to desired outputs. One standard formulation of the supervised learning task is the classification problem: the learner is required to learn (to approximate the behavior of) a function which maps a vector into one of several classes by looking at several input-output examples of the function. • Unsupervised learning --which models a set of inputs: labeled examples are not available. • Semi-supervised learning --which combines both labeled and unlabeled examples to generate an appropriate function or classifier. • Reinforcement learning --where the algorithm learns a policy of how to act given an observation of the world. Every action has some impact in the environment, and the environment provides feedback that guides the learning algorithm. • Transduction --similar to supervised learning, but does not explicitly construct a function: instead, tries to predict new outputs based on training inputs, training outputs, and new inputs. • Learning to learn --where the algorithm learns its own inductive bias based on previous experience.

...read moreread less

Journal Article•DOI•

Quantitative Comparison of Spot Detection Methods in Fluorescence Microscopy

[...]

Ihor Smal¹, Marco Loog², Wiro J. Niessen¹, Erik Meijering¹•Institutions (2)

Erasmus University Rotterdam¹, Delft University of Technology²

01 Feb 2010-IEEE Transactions on Medical Imaging

TL;DR: Seven unsupervised and two supervised detection methods based on the so-called h -dome transform from mathematical morphology or the multiscale variance-stabilizing transform perform comparably, and have the advantage that they do not require a cumbersome learning stage.

...read moreread less

Abstract: Quantitative analysis of biological image data generally involves the detection of many subresolution spots. Especially in live cell imaging, for which fluorescence microscopy is often used, the signal-to-noise ratio (SNR) can be extremely low, making automated spot detection a very challenging task. In the past, many methods have been proposed to perform this task, but a thorough quantitative evaluation and comparison of these methods is lacking in the literature. In this paper, we evaluate the performance of the most frequently used detection methods for this purpose. These include seven unsupervised and two supervised methods. We perform experiments on synthetic images of three different types, for which the ground truth was available, as well as on real image data sets acquired for two different biological studies, for which we obtained expert manual annotations to compare with. The results from both types of experiments suggest that for very low SNRs ( ? 2), the supervised (machine learning) methods perform best overall. Of the unsupervised methods, the detectors based on the so-called h -dome transform from mathematical morphology or the multiscale variance-stabilizing transform perform comparably, and have the advantage that they do not require a cumbersome learning stage. At high SNRs ( > 5), the difference in performance of all considered detectors becomes negligible.

...read moreread less

Proceedings Article•

Painless Unsupervised Learning with Features

[...]

Taylor Berg-Kirkpatrick¹, Alexandre Bouchard-Côté¹, John DeNero¹, Dan Klein¹•Institutions (1)

University of California, Berkeley¹

02 Jun 2010

TL;DR: This work shows how features can easily be added to standard generative models for unsupervised learning, without requiring complex new training methods, and applies this technique to part-of-speech induction, grammar induction, word alignment, and word segmentation.

...read moreread less

Abstract: We show how features can easily be added to standard generative models for unsupervised learning, without requiring complex new training methods. In particular, each component multinomial of a generative model can be turned into a miniature logistic regression model if feature locality permits. The intuitive EM algorithm still applies, but with a gradient-based M-step familiar from discriminative training of logistic regression models. We apply this technique to part-of-speech induction, grammar induction, word alignment, and word segmentation, incorporating a few linguistically-motivated features into the standard generative model for each task. These feature-enhanced models each outperform their basic counterparts by a substantial margin, and even compete with and surpass more complex state-of-the-art models.

...read moreread less

Journal Article•DOI•

Unsupervised Object Discovery: A Comparison

[...]

Tinne Tuytelaars¹, Christoph H. Lampert², Matthew B. Blaschko³, Wray Buntine⁴•Institutions (4)

Katholieke Universiteit Leuven¹, Max Planck Society², University of Oxford³, Australian National University⁴

01 Jun 2010-International Journal of Computer Vision

TL;DR: The goal of this paper is to discover the objects present in the images by analyzing unlabeled data and searching for re-occurring patterns, and a rigorous framework for evaluating unsupervised object discovery methods is proposed.

...read moreread less

Abstract: The goal of this paper is to evaluate and compare models and methods for learning to recognize basic entities in images in an unsupervised setting. In other words, we want to discover the objects present in the images by analyzing unlabeled data and searching for re-occurring patterns. We experiment with various baseline methods, methods based on latent variable models, as well as spectral clustering methods. The results are presented and compared both on subsets of Caltech256 and MSRC2, data sets that are larger and more challenging and that include more object classes than what has previously been reported in the literature. A rigorous framework for evaluating unsupervised object discovery methods is proposed.

...read moreread less

Posted Content•

Safe Feature Elimination in Sparse Supervised Learning

[...]

Laurent El Ghaoui¹, Vivian Viallon¹, Tarek Rabbani¹•Institutions (1)

University of California, Berkeley¹

17 Sep 2010-arXiv: Learning

TL;DR: This work investigates fast methods that allow to quickly eliminate variables (features) in supervised learning problems involving a convex loss function and a $l_1$-norm penalty, leading to a potentially substantial reduction in the number of variables prior to running the supervised learning algorithm.

...read moreread less

Abstract: We investigate fast methods that allow to quickly eliminate variables (features) in supervised learning problems involving a convex loss function and a $l_1$-norm penalty, leading to a potentially substantial reduction in the number of variables prior to running the supervised learning algorithm. The methods are not heuristic: they only eliminate features that are {\em guaranteed} to be absent after solving the learning problem. Our framework applies to a large class of problems, including support vector machine classification, logistic regression and least-squares. The complexity of the feature elimination step is negligible compared to the typical computational effort involved in the sparse supervised learning problem: it grows linearly with the number of features times the number of examples, with much better count if data is sparse. We apply our method to data sets arising in text classification and observe a dramatic reduction of the dimensionality, hence in computational effort required to solve the learning problem, especially when very sparse classifiers are sought. Our method allows to immediately extend the scope of existing algorithms, allowing us to run them on data sets of sizes that were out of their reach before.

...read moreread less

Collapse