scispace - formally typeset
Search or ask a question

Showing papers on "Active learning (machine learning) published in 2007"


Proceedings ArticleDOI
Rajat Raina1, Alexis Battle1, Honglak Lee1, Benjamin Packer1, Andrew Y. Ng1 
20 Jun 2007
TL;DR: An approach to self-taught learning that uses sparse coding to construct higher-level features using the unlabeled data to form a succinct input representation and significantly improve classification performance.
Abstract: We present a new machine learning framework called "self-taught learning" for using unlabeled data in supervised classification tasks. We do not assume that the unlabeled data follows the same class labels or generative distribution as the labeled data. Thus, we would like to use a large number of unlabeled images (or audio samples, or text documents) randomly downloaded from the Internet to improve performance on a given image (or audio, or text) classification task. Such unlabeled data is significantly easier to obtain than in typical semi-supervised or transfer learning settings, making self-taught learning widely applicable to many practical learning problems. We describe an approach to self-taught learning that uses sparse coding to construct higher-level features using the unlabeled data. These features form a succinct input representation and significantly improve classification performance. When using an SVM for classification, we further show how a Fisher kernel can be learned for this representation.

1,731 citations


Proceedings Article
03 Dec 2007
TL;DR: This contribution develops a theoretical framework that takes into account the effect of approximate optimization on learning algorithms and shows distinct tradeoffs for the case of small-scale and large-scale learning problems.
Abstract: This contribution develops a theoretical framework that takes into account the effect of approximate optimization on learning algorithms. The analysis shows distinct tradeoffs for the case of small-scale and large-scale learning problems. Small-scale learning problems are subject to the usual approximation-estimation tradeoff. Large-scale learning problems are subject to a qualitatively different tradeoff involving the computational complexity of the underlying optimization algorithms in non-trivial ways.

1,599 citations


Proceedings ArticleDOI
20 Jun 2007
TL;DR: A series of experiments indicate that these models with deep architectures show promise in solving harder learning problems that exhibit many factors of variation.
Abstract: Recently, several learning algorithms relying on models with deep architectures have been proposed. Though they have demonstrated impressive performance, to date, they have only been evaluated on relatively simple problems such as digit recognition in a controlled environment, for which many machine learning algorithms already report reasonable results. Here, we present a series of experiments which indicate that these models show promise in solving harder learning problems that exhibit many factors of variation. These models are compared with well-established algorithms such as Support Vector Machines and single hidden-layer feed-forward neural networks.

1,122 citations


Proceedings ArticleDOI
23 Jul 2007
TL;DR: This work presents a general SVM learning algorithm that efficiently finds a globally optimal solution to a straightforward relaxation of MAP, and shows its method to produce statistically significant improvements in MAP scores.
Abstract: Machine learning is commonly used to improve ranked retrieval systems. Due to computational difficulties, few learning techniques have been developed to directly optimize for mean average precision (MAP), despite its widespread use in evaluating such systems. Existing approaches optimizing MAP either do not find a globally optimal solution, or are computationally expensive. In contrast, we present a general SVM learning algorithm that efficiently finds a globally optimal solution to a straightforward relaxation of MAP. We evaluate our approach using the TREC 9 and TREC 10 Web Track corpora (WT10g), comparing against SVMs optimized for accuracy and ROCArea. In most cases we show our method to produce statistically significant improvements in MAP scores.

758 citations


Proceedings Article
06 Jan 2007
TL;DR: This paper shows how to combine prior knowledge and evidence from the expert's actions to derive a probability distribution over the space of reward functions and presents efficient algorithms that find solutions for the reward learning and apprenticeship learning tasks that generalize well over these distributions.
Abstract: Inverse Reinforcement Learning (IRL) is the problem of learning the reward function underlying a Markov Decision Process given the dynamics of the system and the behaviour of an expert IRL is motivated by situations where knowledge of the rewards is a goal by itself (as in preference elicitation) and by the task of apprenticeship learning (learning policies from an expert) In this paper we show how to combine prior knowledge and evidence from the expert's actions to derive a probability distribution over the space of reward functions We present efficient algorithms that find solutions for the reward learning and apprenticeship learning tasks that generalize well over these distributions Experimental results show strong improvement for our methods over previous heuristic-based approaches

663 citations


Book
01 Aug 2007
TL;DR: This book is intended to be a guide to the art of self-consistency and should not be relied on as a substitute for professional advice on how to deal with ambiguity.
Abstract: All rights reserved. No part of this book may be reproduced in any form by any electronic or mechanical means (including photocopying, recording, or information storage and retrieval) without permission in writing from the publisher.

610 citations


Proceedings Article
03 Dec 2007
TL;DR: The experiments show that learning from instance labels can significantly improve performance of a basic MI learning algorithm in two multiple-instance domains: content-based image retrieval and text classification.
Abstract: We present a framework for active learning in the multiple-instance (MI) setting. In an MI learning problem, instances are naturally organized into bags and it is the bags, instead of individual instances, that are labeled for training. MI learners assume that every instance in a bag labeled negative is actually negative, whereas at least one instance in a bag labeled positive is actually positive. We consider the particular case in which an MI learner is allowed to selectively query unlabeled instances from positive bags. This approach is well motivated in domains in which it is inexpensive to acquire bag labels and possible, but expensive, to acquire instance labels. We describe a method for learning from labels at mixed levels of granularity, and introduce two active query selection strategies motivated by the MI setting. Our experiments show that learning from instance labels can significantly improve performance of a basic MI learning algorithm in two multiple-instance domains: content-based image retrieval and text classification.

551 citations


Proceedings ArticleDOI
26 Dec 2007
TL;DR: This work derives a novel active category learning method based on the probabilistic regression model, and shows that a significant boost in classification performance is possible, especially when the amount of training data for a category is ultimately very small.
Abstract: Discriminative methods for visual object category recognition are typically non-probabilistic, predicting class labels but not directly providing an estimate of uncertainty. Gaussian Processes (GPs) are powerful regression techniques with explicit uncertainty models; we show here how Gaussian Processes with covariance functions defined based on a Pyramid Match Kernel (PMK) can be used for probabilistic object category recognition. The uncertainty model provided by GPs offers confidence estimates at test points, and naturally allows for an active learning paradigm in which points are optimally selected for interactive labeling. We derive a novel active category learning method based on our probabilistic regression model, and show that a significant boost in classification performance is possible, especially when the amount of training data for a category is ultimately very small.

400 citations


Proceedings ArticleDOI
06 Nov 2007
TL;DR: It is demonstrated that active learning is capable of solving the class imbalance problem by providing the learner more balanced classes and an efficient way of selecting informative instances from a smaller pool of samples for active learning which does not necessitate a search through the entire dataset.
Abstract: This paper is concerned with the class imbalance problem which has been known to hinder the learning performance of classification algorithms. The problem occurs when there are significantly less number of observations of the target concept. Various real-world classification tasks, such as medical diagnosis, text categorization and fraud detection suffer from this phenomenon. The standard machine learning algorithms yield better prediction performance with balanced datasets. In this paper, we demonstrate that active learning is capable of solving the class imbalance problem by providing the learner more balanced classes. We also propose an efficient way of selecting informative instances from a smaller pool of samples for active learning which does not necessitate a search through the entire dataset. The proposed method yields an efficient querying system and allows active learning to be applied to very large datasets. Our experimental results show that with an early stopping criteria, active learning achieves a fast solution with competitive prediction performance in imbalanced data classification.

382 citations


Book ChapterDOI
13 Jun 2007
TL;DR: The effectiveness of the framework for margin based active learning of linear separators both in the realizable case and in a specific noisy setting related to the Tsybakov small noise condition is analyzed.
Abstract: We present a framework for margin based active learning of linear separators. We instantiate it for a few important cases, some of which have been previously considered in the literature.We analyze the effectiveness of our framework both in the realizable case and in a specific noisy setting related to the Tsybakov small noise condition.

351 citations


Proceedings ArticleDOI
20 Jun 2007
TL;DR: General bounds on the number of label requests made by the A2 algorithm proposed by Balcan, Beygelzimer & Langford are derived, which represents the first nontrivial general-purpose upper bound on label complexity in the agnostic PAC model.
Abstract: We study the label complexity of pool-based active learning in the agnostic PAC model. Specifically, we derive general bounds on the number of label requests made by the A2 algorithm proposed by Balcan, Beygelzimer & Langford (Balcan et al., 2006). This represents the first nontrivial general-purpose upper bound on label complexity in the agnostic PAC model.

Proceedings Article
03 Dec 2007
TL;DR: A discriminative batch mode active learning approach that formulates the instance selection task as a continuous optimization problem over auxiliary instance selection variables to maximize the discrim inative classification performance of the target classifier, while also taking the unlabeled data into account.
Abstract: Active learning sequentially selects unlabeled instances to label with the goal of reducing the effort needed to learn a good classifier. Most previous studies in active learning have focused on selecting one unlabeled instance to label at a time while retraining in each iteration. Recently a few batch mode active learning approaches have been proposed that select a set of most informative unlabeled instances in each iteration under the guidance of heuristic scores. In this paper, we propose a discriminative batch mode active learning approach that formulates the instance selection task as a continuous optimization problem over auxiliary instance selection variables. The optimization is formulated to maximize the discriminative classification performance of the target classifier, while also taking the unlabeled data into account. Although the objective is not convex, we can manipulate a quasi-Newton method to obtain a good local solution. Our empirical studies on UCI datasets show that the proposed active learning is more effective than current state-of-the art batch mode active learning algorithms.

Proceedings ArticleDOI
26 Dec 2007
TL;DR: This paper treats tracking as a foreground/background classification problem and proposes an online semi- supervised learning framework that improves each individual classifier using the information from other features, thus leading to a more robust tracker.
Abstract: This paper treats tracking as a foreground/background classification problem and proposes an online semi- supervised learning framework. Initialized with a small number of labeled samples, semi-supervised learning treats each new sample as unlabeled data. Classification of new data and updating of the classifier are achieved simultaneously in a co-training framework. The object is represented using independent features and an online support vector machine (SVM) is built for each feature. The predictions from different features are fused by combining the confidence map from each classifier using a classifier weighting method which creates a final classifier that performs better than any classifier based on a single feature. The semi-supervised learning approach then uses the output of the combined confidence map to generate new samples and update the SVMs online. With this approach, the tracker gains increasing knowledge of the object and background and continually improves itself over time. Compared to other discriminative trackers, the online semi-supervised learning approach improves each individual classifier using the information from other features, thus leading to a more robust tracker. Experiments show that this framework performs better than state-of-the-art tracking algorithms on challenging sequences.

Book
01 Jun 2007
TL;DR: Introduction Learning and intelligence Machine learning basics Knowledge representation Learning as search Attribute quality measures Data pre-processing Constructive induction Symbolic learning Statistical learning
Abstract: Introduction Learning and intelligence Machine learning basics Knowledge representation Learning as search Attribute quality measures Data pre-processing Constructive induction Symbolic learning Statistical learning Artificial neural networks Cluster analysis Learning theory Computational learning theory Definitions References and index.

Proceedings ArticleDOI
28 Oct 2007
TL;DR: A novel maximum entropy based technique, iterative feature transformation (IFT), is introduced and it is shown how simple relaxations, such as providing additional information like the proportion of positive examples in the test data, can significantly improve the performance of some of the transductive transfer learners.
Abstract: The problem of transfer learning, where information gained in one learning task is used to improve performance in another related task, is an important new area of research. While previous work has studied the supervised version of this problem, we study the more challenging case of unsupervised transductive transfer learning, where no labeled data from the target domain are available at training. We describe some current state-of-the-art inductive and transductive approaches and then adapt these models to the problem of transfer learning for protein name extraction. In the process, we introduce a novel maximum entropy based technique, iterative feature transformation (IFT), and show that it achieves comparable performance with state-of-the-art transductive SVMs. We also show how simple relaxations, such as providing additional information like the proportion of positive examples in the test data, can significantly improve the performance of some of the transductive transfer learners.

Proceedings Article
01 Dec 2007
TL;DR: The experimental results presented in the information extraction domain demonstrate that applying constraints helps the model to generate better feedback during learning, and hence the framework allows for high performance learning with significantly less training data than was possible before on these tasks.
Abstract: Over the last few years, two of the main research directions in machine learning of natural language processing have been the study of semi-supervised learning algorithms as a way to train classiers when the labeled data is scarce, and the study of ways to exploit knowledge and global information in structured learning tasks. In this paper, we suggest a method for incorporating domain knowledge in semi-supervised learning algorithms. Our novel framework unies and can exploit several kinds of task specic constraints. The experimental results presented in the information extraction domain demonstrate that applying constraints helps the model to generate better feedback during learning, and hence the framework allows for high performance learning with significantly less training data than was possible before on these tasks.

Proceedings ArticleDOI
20 Jun 2007
TL;DR: An analysis and efficient algorithms are presented that address the question of when an active learning, or sequential design, strategy will perform significantly better than sensing at an a priori specified set of locations for Gaussian Processes.
Abstract: When monitoring spatial phenomena, such as the ecological condition of a river, deciding where to make observations is a challenging task. In these settings, a fundamental question is when an active learning, or sequential design, strategy, where locations are selected based on previous measurements, will perform significantly better than sensing at an a priori specified set of locations. For Gaussian Processes (GPs), which often accurately model spatial phenomena, we present an analysis and efficient algorithms that address this question. Central to our analysis is a theoretical bound which quantifies the performance difference between active and a priori design strategies. We consider GPs with unknown kernel parameters and present a nonmyopic approach for trading off exploration, i.e., decreasing uncertainty about the model parameters, and exploitation, i.e., near-optimally selecting observations when the parameters are (approximately) known. We discuss several exploration strategies, and present logarithmic sample complexity bounds for the exploration phase. We then extend our algorithm to handle nonstationary GPs exploiting local structure in the model. We also present extensive empirical evaluation on several real-world problems.

Proceedings ArticleDOI
20 Jun 2007
TL;DR: This work uses a generalization of the EM-base reinforcement learning framework suggested by Dayan & Hinton to reduce the problem of learning with immediate rewards to a reward-weighted regression problem with an adaptive, integrated reward transformation for faster convergence.
Abstract: Many robot control problems of practical importance, including operational space control, can be reformulated as immediate reward reinforcement learning problems. However, few of the known optimization or reinforcement learning algorithms can be used in online learning control for robots, as they are either prohibitively slow, do not scale to interesting domains of complex robots, or require trying out policies generated by random search, which are infeasible for a physical system. Using a generalization of the EM-base reinforcement learning framework suggested by Dayan & Hinton, we reduce the problem of learning with immediate rewards to a reward-weighted regression problem with an adaptive, integrated reward transformation for faster convergence. The resulting algorithm is efficient, learns smoothly without dangerous jumps in solution space, and works well in applications of complex high degree-of-freedom robots.

Book ChapterDOI
02 Apr 2007
TL;DR: This paper proposes a new algorithm which can efficiently maximize the learning benefits of relevance feedback and chooses a set of feedback documents based on relevancy, document diversity and document density.
Abstract: Relevance feedback, which uses the terms in relevant documents to enrich the user's initial query, is an effective method for improving retrieval performance. An associated key research problem is the following: Which documents to present to the user so that the user's feedback on the documents can significantly impact relevance feedback performance. This paper views this as an active learning problem and proposes a new algorithm which can efficiently maximize the learning benefits of relevance feedback. This algorithm chooses a set of feedback documents based on relevancy, document diversity and document density. Experimental results show a statistically significant and appreciable improvement in the performance of our new approach over the existing active feedback methods.

Book ChapterDOI
17 Sep 2007
TL;DR: A dynamic approach, called DUAL, where the strategy selection parameters are adaptively updated based on estimated future residual error reduction after each actively sampled point, to outperform static strategies over a large operating range.
Abstract: Active Learning methods rely on static strategies for sampling unlabeled point(s). These strategies range from uncertainty sampling and density estimation to multi-factor methods with learn-once-use-always model parameters. This paper proposes a dynamic approach, called DUAL, where the strategy selection parameters are adaptively updated based on estimated future residual error reduction after each actively sampled point. The objective of dual is to outperform static strategies over a large operating range: from very few to very many labeled points. Empirical results over six datasets demonstrate that DUAL outperforms several state-of-the-art methods on most datasets.

Patent
01 Aug 2007
TL;DR: In this paper, a teaching/learning system includes a real-time class management module to selectively allocate first and second digital learning objects for performance, substantially in parallel, on first andsecond student stations, respectively.
Abstract: Device, system, and method of adaptive teaching and learning. For example, a teaching/learning system includes a real-time class management module to selectively allocate first and second digital learning objects for performance, substantially in parallel, on first and second student stations, respectively.

Proceedings Article
01 Jun 2007
TL;DR: In this paper, the tasks of learning the order of inference and training the local classifier are dynamically incorporated into a single perceptron-like learning algorithm, which achieves an error rate of 2.67% on the standard PTB test set.
Abstract: In this paper, we propose guided learning, a new learning framework for bidirectional sequence classification. The tasks of learning the order of inference and training the local classifier are dynamically incorporated into a single Perceptron like learning algorithm. We apply this novel learning algorithm to POS tagging. It obtains an error rate of 2.67% on the standard PTB test set, which represents 3.3% relative error reduction over the previous best result on the same data set, while using fewer features.

Proceedings Article
01 Jun 2007
TL;DR: A bootstrap-based oversampling (BootOS) method is proposed that works better than ordinary over-sampling in active learning for word sense disambiguation and a prediction solution by considering max-confidence as the upper bound and min-error as the lower bound for stopping conditions is suggested.
Abstract: In this paper, we analyze the effect of resampling techniques, including undersampling and over-sampling used in active learning for word sense disambiguation (WSD). Experimental results show that under-sampling causes negative effects on active learning, but over-sampling is a relatively good choice. To alleviate the withinclass imbalance problem of over-sampling, we propose a bootstrap-based oversampling (BootOS) method that works better than ordinary over-sampling in active learning for WSD. Finally, we investigate when to stop active learning, and adopt two strategies, max-confidence and min-error, as stopping conditions for active learning. According to experimental results, we suggest a prediction solution by considering max-confidence as the upper bound and min-error as the lower bound for stopping conditions.

Proceedings ArticleDOI
20 Jun 2007
TL;DR: The MissSVM algorithm is proposed which addresses multi- instance learning using a special semi-supervised support vector machine and is competitive with state-of-the-art multi-instance learning algorithms.
Abstract: Multi-instance learning and semi-supervised learning are different branches of machine learning. The former attempts to learn from a training set consists of labeled bags each containing many unlabeled instances; the latter tries to exploit abundant unlabeled instances when learning with a small number of labeled examples. In this paper, we establish a bridge between these two branches by showing that multi-instance learning can be viewed as a special case of semi-supervised learning. Based on this recognition, we propose the MissSVM algorithm which addresses multi-instance learning using a special semi-supervised support vector machine. Experiments show that solving multi-instance problems from the view of semi-supervised learning is feasible, and the MissSVM algorithm is competitive with state-of-the-art multi-instance learning algorithms.

Proceedings Article
03 Dec 2007
TL;DR: An active learning algorithm that learns a continuous valuation model from discrete preferences that maximizes the expected improvement at each query without accurately modelling the entire valuation surface, which would be needlessly expensive.
Abstract: We propose an active learning algorithm that learns a continuous valuation model from discrete preferences. The algorithm automatically decides what items are best presented to an individual in order to find the item that they value highly in as few trials as possible, and exploits quirks of human psychology to minimize time and cognitive burden. To do this, our algorithm maximizes the expected improvement at each query without accurately modelling the entire valuation surface, which would be needlessly expensive. The problem is particularly difficult because the space of choices is infinite. We demonstrate the effectiveness of the new algorithm compared to related active learning methods. We also embed the algorithm within a decision making tool for assisting digital artists in rendering materials. The tool finds the best parameters while minimizing the number of queries.

Journal ArticleDOI
TL;DR: A novel supervised network intrusion detection method based on TCM-KNN (Transductive Confidence Machines for K-Nearest Neighbors) machine learning algorithm and active learning based training data selection method that can effectively detect anomalies with high detection rate, low false positives and can be further optimized as discussed in this paper for real applications.

Proceedings ArticleDOI
27 Jun 2007
TL;DR: A simulation-based active policy learning algorithm for finite-horizon, partially-observed sequential decision processes, tested in the domain of robot navigation and exploration under uncertainty, which effectively trades-off between exploration and exploitation.
Abstract: This paper proposes a simulation-based active policy learning algorithm for finite-horizon, partially-observed sequential decision processes. The algorithm is tested in the domain of robot navigation and exploration under uncertainty. In such a setting, the expected cost, that must be minimized, is a function of the belief state (filtering distribution). This filtering distribution is in turn nonlinear and subject to discontinuities, which arise because constraints in the robot motion and control models. As a result, the expected cost is non-differentiable and very expensive to simulate. The new algorithm overcomes the first difficulty and reduces the number of required simulations as follows. First, it assumes that we have carried out previous simulations which returned values of the expected cost for different corresponding policy parameters. Second, it fits a Gaussian process (GP) regression model to these values, so as to approximate the expected cost as a function of the policy parameters. Third, it uses the GP predicted mean and variance to construct a statistical measure that determines which policy parameters should be used in the next simulation. The process is then repeated using the new parameters and the newly gathered expected cost observation. Since the objective is to find the policy parameters that minimize the expected cost, this iterative active learning approach effectively trades-off between exploration (in regions where the GP variance is large) and exploitation (where the GP mean is low). In our experiments, a robot uses the proposed algorithm to plan an optimal path for accomplishing a series of tasks, while maximizing the information about its pose and map estimates. These estimates are obtained with a standard filter for simultaneous localization and mapping. Upon gathering new observations, the robot updates the state estimates and is able to replan a new path in the spirit of open-loop feedback control.

Proceedings ArticleDOI
23 Jul 2007
TL;DR: It is demonstrated that active learning is capable of solving the class imbalance problem and helping real-world classification tasks such as text categorization.
Abstract: The class imbalance problem has been known to hinder the learning performance of classification algorithms. Various real-world classification tasks such as text categorization suffer from this phenomenon. We demonstrate that active learning is capable of solving the problem.

Book ChapterDOI
16 Dec 2007
TL;DR: A committee-based approach for active learning of real-valued functions is investigated, which is a variance-only strategy for selection of informative training data and shows to suffer when the model class is misspecified since the learner's bias is high.
Abstract: We investigate a committee-based approach for active learning of real-valued functions. This is a variance-only strategy for selection of informative training data. As such it is shown to suffer when the model class is misspecified since the learner's bias is high. Conversely, the strategy outperforms passive selection when the model class is very expressive since active minimization of the variance avoids overfitting.

Patent
17 Jul 2007
TL;DR: In this article, a narrowcasting engine includes an active data gathering module to collect the user data, and an active learning module to generate a user profile based on the user profile.
Abstract: A communication system with client devices in communication with at least one communication network. User data stores are also in communication with the communications network and store user data of users using respective ones of the client devices. Offer data stores also in communication with the communications network store offers from merchants. A narrowcasting engine includes an active data gathering module to collect the user data, and an active learning module to generate a user profile based on the user data. The communication engine selects dynamically offers from the offer data store based on the profile, and communicates the selected offers in the offer data store to the users.