scispace - formally typeset
Search or ask a question

Showing papers on "Active learning (machine learning) published in 2013"


Journal ArticleDOI
TL;DR: Recent work in the area of unsupervised feature learning and deep learning is reviewed, covering advances in probabilistic models, autoencoders, manifold learning, and deep networks.
Abstract: The success of machine learning algorithms generally depends on data representation, and we hypothesize that this is because different representations can entangle and hide more or less the different explanatory factors of variation behind the data. Although specific domain knowledge can be used to help design representations, learning with generic priors can also be used, and the quest for AI is motivating the design of more powerful representation-learning algorithms implementing such priors. This paper reviews recent work in the area of unsupervised feature learning and deep learning, covering advances in probabilistic models, autoencoders, manifold learning, and deep networks. This motivates longer term unanswered questions about the appropriate objectives for learning good representations, for computing representations (i.e., inference), and the geometrical connections between representation learning, density estimation, and manifold learning.

11,201 citations


Posted Content
TL;DR: By exploring the consistency and complementary properties of different views, multi-View learning is rendered more effective, more promising, and has better generalization ability than single-view learning.
Abstract: In recent years, a great many methods of learning from multi-view data by considering the diversity of different views have been proposed. These views may be obtained from multiple sources or different feature subsets. In trying to organize and highlight similarities and differences between the variety of multi-view learning approaches, we review a number of representative multi-view learning algorithms in different areas and classify them into three groups: 1) co-training, 2) multiple kernel learning, and 3) subspace learning. Notably, co-training style algorithms train alternately to maximize the mutual agreement on two distinct views of the data; multiple kernel learning algorithms exploit kernels that naturally correspond to different views and combine kernels either linearly or non-linearly to improve learning performance; and subspace learning algorithms aim to obtain a latent subspace shared by multiple views by assuming that the input views are generated from this latent subspace. Though there is significant variance in the approaches to integrating multiple views to improve learning performance, they mainly exploit either the consensus principle or the complementary principle to ensure the success of multi-view learning. Since accessing multiple views is the fundament of multi-view learning, with the exception of study on learning a model from multiple views, it is also valuable to study how to construct multiple views and how to evaluate these views. Overall, by exploring the consistency and complementary properties of different views, multi-view learning is rendered more effective, more promising, and has better generalization ability than single-view learning.

995 citations


Book
15 Aug 2013
TL;DR: Metric Learning: A Review presents an overview of existing research in this topic, including recent progress on scaling to high-dimensional feature spaces and to data sets with an extremely large number of data points.
Abstract: The metric learning problem is concerned with learning a distance function tuned to a particular task, and has been shown to be useful when used in conjunction with nearest-neighbor methods and other techniques that rely on distances or similarities. Metric Learning: A Review presents an overview of existing research in this topic, including recent progress on scaling to high-dimensional feature spaces and to data sets with an extremely large number of data points. It presents as unified a framework as possible under which existing research on metric learning can be cast. The monograph starts out by focusing on linear metric learning approaches, and mainly concentrates on the class of Mahalanobis distance learning methods. It then discusses nonlinear metric learning approaches, focusing on the connections between the non-linear and linear approaches. Finally, it discusses extensions of metric learning, as well as applications to a variety of problems in computer vision, text analysis, program analysis, and multimedia. Metric Learning: A Review is an ideal reference for anyone interested in the metric learning problem. It synthesizes much of the recent work in the area and it is hoped that it will inspire new algorithms and applications.

810 citations


Journal ArticleDOI
TL;DR: This paper reviews theories developed to understand the properties and behaviors of multi-view learning and gives a taxonomy of approaches according to the machine learning mechanisms involved and the fashions in which multiple views are exploited.
Abstract: Multi-view learning or learning with multiple distinct feature sets is a rapidly growing direction in machine learning with well theoretical underpinnings and great practical success. This paper reviews theories developed to understand the properties and behaviors of multi-view learning and gives a taxonomy of approaches according to the machine learning mechanisms involved and the fashions in which multiple views are exploited. This survey aims to provide an insightful organization of current developments in the field of multi-view learning, identify their limitations, and give suggestions for further research. One feature of this survey is that we attempt to point out specific open problems which can hopefully be useful to promote the research of multi-view machine learning.

782 citations


Posted Content
TL;DR: A systematic review of the metric learning literature is proposed, highlighting the pros and cons of each approach and presenting a wide range of methods that have recently emerged as powerful alternatives, including nonlinear metric learning, similarity learning and local metric learning.
Abstract: The need for appropriate ways to measure the distance or similarity between data is ubiquitous in machine learning, pattern recognition and data mining, but handcrafting such good metrics for specific problems is generally difficult. This has led to the emergence of metric learning, which aims at automatically learning a metric from data and has attracted a lot of interest in machine learning and related fields for the past ten years. This survey paper proposes a systematic review of the metric learning literature, highlighting the pros and cons of each approach. We pay particular attention to Mahalanobis distance metric learning, a well-studied and successful framework, but additionally present a wide range of methods that have recently emerged as powerful alternatives, including nonlinear metric learning, similarity learning and local metric learning. Recent trends and extensions, such as semi-supervised metric learning, metric learning for histogram data and the derivation of generalization guarantees, are also covered. Finally, this survey addresses metric learning for structured data, in particular edit distance learning, and attempts to give an overview of the remaining challenges in metric learning for the years to come.

671 citations


Journal ArticleDOI
TL;DR: A weighted ELM which is able to deal with data with imbalanced class distribution while maintain the good performance on well balanced data as unweighted ELM and generalized to cost sensitive learning.

627 citations


Journal ArticleDOI
TL;DR: This paper investigates different types of class imbalance learning methods, including resampling techniques, threshold moving, and ensemble algorithms, and concludes that AdaBoost.NC shows the best overall performance in terms of the measures including balance, G-mean, and Area Under the Curve (AUC).
Abstract: To facilitate software testing, and save testing costs, a wide range of machine learning methods have been studied to predict defects in software modules. Unfortunately, the imbalanced nature of this type of data increases the learning difficulty of such a task. Class imbalance learning specializes in tackling classification problems with imbalanced distributions, which could be helpful for defect prediction, but has not been investigated in depth so far. In this paper, we study the issue of if and how class imbalance learning methods can benefit software defect prediction with the aim of finding better solutions. We investigate different types of class imbalance learning methods, including resampling techniques, threshold moving, and ensemble algorithms. Among those methods we studied, AdaBoost.NC shows the best overall performance in terms of the measures including balance, G-mean, and Area Under the Curve (AUC). To further improve the performance of the algorithm, and facilitate its use in software defect prediction, we propose a dynamic version of AdaBoost.NC, which adjusts its parameter automatically during training. Without the need to pre-define any parameters, it is shown to be more effective and efficient than the original AdaBoost.NC.

457 citations


Journal ArticleDOI
TL;DR: The learning problem in cognitive radios (CRs) is characterized and the importance of artificial intelligence in achieving real cognitive communications systems is stated and the conditions under which each of the techniques may be applied are identified.
Abstract: In this survey paper, we characterize the learning problem in cognitive radios (CRs) and state the importance of artificial intelligence in achieving real cognitive communications systems. We review various learning problems that have been studied in the context of CRs classifying them under two main categories: Decision-making and feature classification. Decision-making is responsible for determining policies and decision rules for CRs while feature classification permits identifying and classifying different observation models. The learning algorithms encountered are categorized as either supervised or unsupervised algorithms. We describe in detail several challenging learning issues that arise in cognitive radio networks (CRNs), in particular in non-Markovian environments and decentralized networks, and present possible solution methods to address them. We discuss similarities and differences among the presented algorithms and identify the conditions under which each of the techniques may be applied.

455 citations


Journal ArticleDOI
Li Deng1, Xiao Li1
TL;DR: This overview article provides readers with an overview of modern ML techniques as utilized in the current and as relevant to future ASR research and systems, and presents and analyzes recent developments of deep learning and learning with sparse representations.
Abstract: Automatic Speech Recognition (ASR) has historically been a driving force behind many machine learning (ML) techniques, including the ubiquitously used hidden Markov model, discriminative learning, structured sequence learning, Bayesian learning, and adaptive learning. Moreover, ML can and occasionally does use ASR as a large-scale, realistic application to rigorously test the effectiveness of a given technique, and to inspire new problems arising from the inherently sequential and dynamic nature of speech. On the other hand, even though ASR is available commercially for some applications, it is largely an unsolved problem - for almost all applications, the performance of ASR is not on par with human performance. New insight from modern ML methodology shows great promise to advance the state-of-the-art in ASR technology. This overview article provides readers with an overview of modern ML techniques as utilized in the current and as relevant to future ASR research and systems. The intent is to foster further cross-pollination between the ML and ASR communities than has occurred in the past. The article is organized according to the major ML paradigms that are either popular already or have potential for making significant contributions to ASR technology. The paradigms presented and elaborated in this overview include: generative and discriminative learning; supervised, unsupervised, semi-supervised, and active learning; adaptive and multi-task learning; and Bayesian learning. These learning paradigms are motivated and discussed in the context of ASR technology and applications. We finally present and analyze recent developments of deep learning and learning with sparse representations, focusing on their direct relevance to advancing ASR technology.

346 citations


Proceedings Article
15 Mar 2013
TL;DR: It is proposed that it is now appropriate for the AI community to move beyond learning algorithms to more seriously consider the nature of systems that are capable of learning over a lifetime.
Abstract: Lifelong Machine Learning, or LML, considers systems that can learn many tasks from one or more domains over its lifetime. The goal is to sequentially retain learned knowledge and to selectively transfer that knowledge when learning a new task so as to develop more accurate hypotheses or policies. Following a review of prior work on LML, we propose that it is now appropriate for the AI community to move beyond learning algorithms to more seriously consider the nature of systems that are capable of learning over a lifetime. Reasons for our position are presented and potential counter-arguments are discussed. The remainder of the paper contributes by defining LML, presenting a reference framework that considers all forms of machine learning, and listing several key challenges for and benefits from LML research. We conclude with ideas for next steps to advance the field.

344 citations


Proceedings ArticleDOI
04 Feb 2013
TL;DR: This work proposes a new model to predict a gold-standard ranking that hinges on combining pairwise comparisons via crowdsourcing and formalizes this as an active learning strategy that incorporates an exploration-exploitation tradeoff and implements it using an efficient online Bayesian updating scheme.
Abstract: Inferring rankings over elements of a set of objects, such as documents or images, is a key learning problem for such important applications as Web search and recommender systems. Crowdsourcing services provide an inexpensive and efficient means to acquire preferences over objects via labeling by sets of annotators. We propose a new model to predict a gold-standard ranking that hinges on combining pairwise comparisons via crowdsourcing. In contrast to traditional ranking aggregation methods, the approach learns about and folds into consideration the quality of contributions of each annotator. In addition, we minimize the cost of assessment by introducing a generalization of the traditional active learning scenario to jointly select the annotator and pair to assess while taking into account the annotator quality, the uncertainty over ordering of the pair, and the current model uncertainty. We formalize this as an active learning strategy that incorporates an exploration-exploitation tradeoff and implement it using an efficient online Bayesian updating scheme. Using simulated and real-world data, we demonstrate that the active learning strategy achieves significant reductions in labeling cost while maintaining accuracy.

Book
25 Oct 2013
TL;DR: "Machine Learning with R" is a practical tutorial that uses hands-on examples to step through real-world application of machine learning and will provide you with the analytical tools you need to quickly gain insight from complex data.
Abstract: Learn how to use R to apply powerful machine learning methods and gain an insight into real-world applications Overview Harness the power of R for statistical computing and data science Use R to apply common machine learning algorithms with real-world applications Prepare, examine, and visualize data for analysis Understand how to choose between machine learning models Packed with clear instructions to explore, forecast, and classify data In Detail Machine learning, at its core, is concerned with transforming data into actionable knowledge. This fact makes machine learning well-suited to the present-day era of "big data" and "data science". Given the growing prominence of Ra cross-platform, zero-cost statistical programming environmentthere has never been a better time to start applying machine learning. Whether you are new to data science or a veteran, machine learning with R offers a powerful set of methods for quickly and easily gaining insight from your data. "Machine Learning with R" is a practical tutorial that uses hands-on examples to step through real-world application of machine learning. Without shying away from the technical details, we will explore Machine Learning with R using clear and practical examples. Well-suited to machine learning beginners or those with experience. Explore R to find the answer to all of your questions. How can we use machine learning to transform data into action? Using practical examples, we will explore how to prepare data for analysis, choose a machine learning method, and measure the success of the process. We will learn how to apply machine learning methods to a variety of common tasks including classification, prediction, forecasting, market basket analysis, and clustering. By applying the most effective machine learning methods to real-world problems, you will gain hands-on experience that will transform the way you think about data. "Machine Learning with R" will provide you with the analytical tools you need to quickly gain insight from complex data. What you will learn from this book Understand the basic terminology of machine learning and how to differentiate among various machine learning approaches Use R to prepare data for machine learning Explore and visualize data with R Classify data using nearest neighbor methods Learn about Bayesian methods for classifying data Predict values using decision trees, rules, and support vector machines Forecast numeric values using linear regression Model data using neural networks Find patterns in data using association rules for market basket analysis Group data into clusters for segmentation Evaluate and improve the performance of machine learning models Learn specialized machine learning techniques for text mining, social network data, and big data Approach Written as a tutorial to explore and understand the power of R for machine learning. This practical guide that covers all of the need to know topics in a very systematic way. For each machine learning approach, each step in the process is detailed, from preparing the data for analysis to evaluating the results. These steps will build the knowledge you need to apply them to your own data science tasks.

Proceedings Article
16 Jun 2013
TL;DR: The proposed Efficient Lifelong Learning Algorithm (ELLA) maintains a sparsely shared basis for all task models, transfers knowledge from the basis to learn each new task, and refines the basis over time to maximize performance across all tasks.
Abstract: The problem of learning multiple consecutive tasks, known as lifelong learning, is of great importance to the creation of intelligent, general-purpose, and flexible machines. In this paper, we develop a method for online multi-task learning in the lifelong learning setting. The proposed Efficient Lifelong Learning Algorithm (ELLA) maintains a sparsely shared basis for all task models, transfers knowledge from the basis to learn each new task, and refines the basis over time to maximize performance across all tasks. We show that ELLA has strong connections to both online dictionary learning for sparse coding and state-of-the-art batch multitask learning methods, and provide robust theoretical performance guarantees. We show empirically that ELLA yields nearly identical performance to batch multi-task learning while learning tasks sequentially in three orders of magnitude (over 1,000x) less time.

Proceedings ArticleDOI
Xin Li1, Yuhong Guo1
23 Jun 2013
TL;DR: This paper presents a novel adaptive active learning approach that combines an information density measure and a most uncertainty measure together to select critical instances to label for image classifications.
Abstract: Recently active learning has attracted a lot of attention in computer vision field, as it is time and cost consuming to prepare a good set of labeled images for vision data analysis. Most existing active learning approaches employed in computer vision adopt most uncertainty measures as instance selection criteria. Although most uncertainty query selection strategies are very effective in many circumstances, they fail to take information in the large amount of unlabeled instances into account and are prone to querying outliers. In this paper, we present a novel adaptive active learning approach that combines an information density measure and a most uncertainty measure together to select critical instances to label for image classifications. Our experiments on two essential tasks of computer vision, object recognition and scene recognition, demonstrate the efficacy of the proposed approach.

Journal ArticleDOI
TL;DR: The experimental results show that, when the learning path suggested by a hybrid decision tree is employed, the learners have a 90% probability of obtaining an above-average creativity score, which suggests that the employed data mining technique can be a good vehicle for providing adaptive learning that is related to creativity.
Abstract: Customizing a learning environment to optimize personal learning has recently become a popular trend in e-learning. Because creativity has become an essential skill in the current e-learning epoch, this study aims to develop a personalized creativity learning system (PCLS) that is based on the data mining technique of decision trees to provide personalized learning paths for optimizing the performance of creativity. The PCLS includes a series of creativity tasks as well as a questionnaire regarding several key variables. Ninety-two college students were included in this study to examine the effectiveness of the PCLS. The experimental results show that, when the learning path suggested by a hybrid decision tree is employed, the learners have a 90% probability of obtaining an above-average creativity score, which suggests that the employed data mining technique can be a good vehicle for providing adaptive learning that is related to creativity. Moreover, the findings in this study shed light on what components should be accounted for when designing a personalized creativity learning system as well as how to integrate personalized learning and game-based learning into a creative learning program to maximize learner motivation and learning effects.

Journal ArticleDOI
20 Aug 2013-PLOS ONE
TL;DR: The use of variation of information to measure segmentation accuracy is advocated, particularly in 3D electron microscopy images of neural tissue, and using this metric demonstrate an improvement over competing algorithms in EM and natural images.
Abstract: We aim to improve segmentation through the use of machine learning tools during region agglomeration. We propose an active learning approach for performing hierarchical agglomerative segmentation from superpixels. Our method combines multiple features at all scales of the agglomerative process, works for data with an arbitrary number of dimensions, and scales to very large datasets. We advocate the use of variation of information to measure segmentation accuracy, particularly in 3D electron microscopy (EM) images of neural tissue, and using this metric demonstrate an improvement over competing algorithms in EM and natural images.

Proceedings ArticleDOI
21 Oct 2013
TL;DR: This paper proposes a novel framework of online multimodal deep similarity learning (OMDSL), which aims to optimally integrate multiple deep neural networks pretrained with stacked denoising autoencoder to improve similarity search in multimedia information retrieval tasks.
Abstract: Recent years have witnessed extensive studies on distance metric learning (DML) for improving similarity search in multimedia information retrieval tasks. Despite their successes, most existing DML methods suffer from two critical limitations: (i) they typically attempt to learn a linear distance function on the input feature space, in which the assumption of linearity limits their capacity of measuring the similarity on complex patterns in real-world applications; (ii) they are often designed for learning distance metrics on uni-modal data, which may not effectively handle the similarity measures for multimedia objects with multimodal representations. To address these limitations, in this paper, we propose a novel framework of online multimodal deep similarity learning (OMDSL), which aims to optimally integrate multiple deep neural networks pretrained with stacked denoising autoencoder. In particular, the proposed framework explores a unified two-stage online learning scheme that consists of (i) learning a flexible nonlinear transformation function for each individual modality, and (ii) learning to find the optimal combination of multiple diverse modalities simultaneously in a coherent process. We conduct an extensive set of experiments to evaluate the performance of the proposed algorithms for multimodal image retrieval tasks, in which the encouraging results validate the effectiveness of the proposed technique.

Journal ArticleDOI
TL;DR: A custom processor that integrates a CPU with configurable accelerators for discriminative machine-learning functions and an accelerator for embedded active learning enables prospective adaptation of the signal models by utilizing sensed data for patient-specific customization, while minimizing the effort from human experts is presented.
Abstract: Low-power sensing technologies have emerged for acquiring physiologically indicative patient signals. However, to enable devices with high clinical value, a critical requirement is the ability to analyze the signals to extract specific medical information. Yet given the complexities of the underlying processes, signal analysis poses numerous challenges. Data-driven methods based on machine learning offer distinct solutions, but unfortunately the computations are not well supported by traditional DSP. This paper presents a custom processor that integrates a CPU with configurable accelerators for discriminative machine-learning functions. A support-vector-machine accelerator realizes various classification algorithms as well as various kernel functions and kernel formulations, enabling range of points within an accuracy-versus-energy and -memory trade space. An accelerator for embedded active learning enables prospective adaptation of the signal models by utilizing sensed data for patient-specific customization, while minimizing the effort from human experts. The prototype is implemented in 130-nm CMOS and operates from 1.2 V-0.55 V (0.7 V for SRAMs). Medical applications for EEG-based seizure detection and ECG-based cardiac-arrhythmia detection are demonstrated using clinical data, while consuming 273 μJ and 124 μJ per detection, respectively; this represents 62.4t and 144.7t energy reduction compared to an implementation based on the CPU. A patient-adaptive cardiac-arrhythmia detector is also demonstrated, reducing the analysis-effort required for model customization by 20 t.

Journal ArticleDOI
TL;DR: This study provides some background regarding the advantages of distributed environments as well as an overview of distributed learning for dealing with “very large” data sets.
Abstract: Traditionally, a bottleneck preventing the development of more intelligent systems was the limited amount of data available. Nowadays, the total amount of information is almost incalculable and automatic data analyzers are even more needed. However, the limiting factor is the inability of learning algorithms to use all the data to learn within a reasonable time. In order to handle this problem, a new field in machine learning has emerged: large-scale learning. In this context, distributed learning seems to be a promising line of research since allocating the learning process among several workstations is a natural way of scaling up learning algorithms. Moreover, it allows to deal with data sets that are naturally distributed, a frequent situation in many real applications. This study provides some background regarding the advantages of distributed environments as well as an overview of distributed learning for dealing with “very large” data sets.

Proceedings ArticleDOI
01 Dec 2013
TL;DR: A new active learning framework for regression called Expected Model Change Maximization (EMCM) is proposed, which aims to choose the examples that lead to the largest change to the current model.
Abstract: Active learning is well-motivated in many supervised learning tasks where unlabeled data may be abundant but labeled examples are expensive to obtain. The goal of active learning is to maximize the performance of a learning model using as few labeled training data as possible, thereby minimizing the cost of data annotation. So far, there is still very limited work on active learning for regression. In this paper, we propose a new active learning framework for regression called Expected Model Change Maximization (EMCM), which aims to choose the examples that lead to the largest change to the current model. The model change is measured as the difference between the current model parameters and the updated parameters after training with the enlarged training set. Inspired by the Stochastic Gradient Descent (SGD) update rule, the change is estimated as the gradient of the loss with respect to a candidate example for active learning. Under this framework, we derive novel active learning algorithms for both linear regression and nonlinear regression to select the most informative examples. Extensive experimental results on the benchmark data sets from UCI machine learning repository have demonstrated that the proposed algorithms are highly effective in choosing the most informative examples and robust to various types of data distributions.

Proceedings ArticleDOI
23 Jun 2013
TL;DR: This paper presents a discriminative method based on a Least-Squares Support Vector Machine formulation that addresses the issue of transferring to the new class and preserving what has already been learned on the source models when learning a new class.
Abstract: Since the seminal work of Thrun [16], the learning to learn paradigm has been defined as the ability of an agent to improve its performance at each task with experience, with the number of tasks. Within the object categorization domain, the visual learning community has actively declined this paradigm in the transfer learning setting. Almost all proposed methods focus on category detection problems, addressing how to learn a new target class from few samples by leveraging over the known source. But if one thinks of learning over multiple tasks, there is a need for multiclass transfer learning algorithms able to exploit previous source knowledge when learning a new class, while at the same time optimizing their overall performance. This is an open challenge for existing transfer learning algorithms. The contribution of this paper is a discriminative method that addresses this issue, based on a Least-Squares Support Vector Machine formulation. Our approach is designed to balance between transferring to the new class and preserving what has already been learned on the source models. Extensive experiments on subsets of publicly available datasets prove the effectiveness of our approach.

Posted Content
02 Jun 2013
TL;DR: This paper proposes to train all layers of the deep networks by backpropagating gradients through the top level SVM, learning features of all layers, and demonstrates a small but consistent advantage of replacing softmax layer with a linear support vector machine.
Abstract: Recently, fully-connected and convolutional neural networks have been trained to reach state-of-the-art performance on a wide variety of tasks such as speech recognition, image classification, natural language processing, and bioinformatics data. For classification tasks, much of these “deep learning” models employ the softmax activation functions to learn output labels in 1-of-K format. In this paper, we demonstrate a small but consistent advantage of replacing softmax layer with a linear support vector machine. Learning minimizes a margin-based loss instead of the cross-entropy loss. In almost all of the previous works, hidden representation of deep networks are first learned using supervised or unsupervised techniques, and then are fed into SVMs as inputs. In contrast to those models, we are proposing to train all layers of the deep networks by backpropagating gradients through the top level SVM, learning features of all layers. Our experiments show that simply replacing softmax with linear SVMs gives significant gains on datasets MNIST, CIFAR-10, and the ICML 2013 Representation Learning Workshop’s face expression recognition challenge.

Journal ArticleDOI
01 Feb 2013
TL;DR: An overview of active learning methods is presented, then the latest techniques proposed to cope with the problem of interactive sampling of training pixels for classification of remotely sensed data with support vector machines (SVMs) are reviewed.
Abstract: Active learning, which has a strong impact on processing data prior to the classification phase, is an active research area within the machine learning community, and is now being extended for remote sensing applications. To be effective, classification must rely on the most informative pixels, while the training set should be as compact as possible. Active learning heuristics provide capability to select unlabeled data that are the “most informative” and to obtain the respective labels, contributing to both goals. Characteristics of remotely sensed image data provide both challenges and opportunities to exploit the potential advantages of active learning. We present an overview of active learning methods, then review the latest techniques proposed to cope with the problem of interactive sampling of training pixels for classification of remotely sensed data with support vector machines (SVMs). We discuss remote sensing specific approaches dealing with multisource and spatially and time-varying data, and provide examples for high-dimensional hyperspectral imagery.

Proceedings Article
16 Jun 2013
TL;DR: The results show PAL's effectiveness; in particular it improves significantly over a state-of-the-art multi-objective optimization method, saving in many cases about 33% evaluations to achieve the same accuracy.
Abstract: In many fields one encounters the challenge of identifying, out of a pool of possible designs, those that simultaneously optimize multiple objectives. This means that usually there is not one optimal design but an entire set of Pareto-optimal ones with optimal tradeoffs in the objectives. In many applications, evaluating one design is expensive; thus, an exhaustive search for the Pareto-optimal set is unfeasible. To address this challenge, we propose the Pareto Active Learning (PAL) algorithm which intelligently samples the design space to predict the Pareto-optimal set. Key features of PAL include (1) modeling the objectives as samples from a Gaussian process distribution to capture structure and accomodate noisy evaluation; (2) a method to carefully choose the next design to evaluate to maximize progress; and (3) the ability to control prediction accuracy and sampling cost. We provide theoretical bounds on PAL's sampling cost required to achieve a desired accuracy. Further, we show an experimental evaluation on three real-world data sets. The results show PAL's effectiveness; in particular it improves significantly over a state-of-the-art multi-objective optimization method, saving in many cases about 33% evaluations to achieve the same accuracy.

Journal ArticleDOI
TL;DR: Results on 20 multiclass imbalanced data sets show that DyS can outperform the compared methods, including pre-sample methods, active learning methods, cost-sensitive methods, and boosting-type methods.
Abstract: Class imbalance learning tackles supervised learning problems where some classes have significantly more examples than others. Most of the existing research focused only on binary-class cases. In this paper, we study multiclass imbalance problems and propose a dynamic sampling method (DyS) for multilayer perceptrons (MLP). In DyS, for each epoch of the training process, every example is fed to the current MLP and then the probability of it being selected for training the MLP is estimated. DyS dynamically selects informative data to train the MLP. In order to evaluate DyS and understand its strength and weakness, comprehensive experimental studies have been carried out. Results on 20 multiclass imbalanced data sets show that DyS can outperform the compared methods, including pre-sample methods, active learning methods, cost-sensitive methods, and boosting-type methods.

Journal ArticleDOI
TL;DR: Experimental results show that transferring the class labels from the source domain to the target domain provides a reliable initial training set and that the priority rule for AL results in a fast convergence to the desired accuracy with respect to Standard AL.
Abstract: This paper proposes a novel change-detection-driven transfer learning (TL) approach to update land-cover maps by classifying remote-sensing images acquired on the same area at different times (i.e., image time series). The proposed approach requires that a reliable training set is available only for one of the images (i.e., the source domain) in the time series whereas it is not for another image to be classified (i.e., the target domain). Unlike other literature TL methods, no additional assumptions on either the similarity between class distributions or the presence of the same set of land-cover classes in the two domains are required. The proposed method aims at defining a reliable training set for the target domain, taking advantage of the already available knowledge on the source domain. This is done by applying an unsupervised-change-detection method to target and source domains and transferring class labels of detected unchanged training samples from the source to the target domain to initialize the target-domain training set. The training set is then optimized by a properly defined novel active learning (AL) procedure. At the early iterations of AL, priority in labeling is given to samples detected as being changed, whereas in the remaining ones, the most informative samples are selected from changed and unchanged unlabeled samples. Finally, the target image is classified. Experimental results show that transferring the class labels from the source domain to the target domain provides a reliable initial training set and that the priority rule for AL results in a fast convergence to the desired accuracy with respect to Standard AL.

Journal ArticleDOI
TL;DR: Experiments on five sentiment classification datasets show that ADN and IADN outperform classical semi-supervised learning algorithms, and deep learning techniques applied for sentiment classification.

Journal ArticleDOI
TL;DR: The ActiveGenLink algorithm is presented, which combines genetic programming and active learning to generate expressive linkage rules interactively and automates the generation of linkage rules and only requires the user to confirm or decline a number of link candidates.

Proceedings Article
16 Jun 2013
TL;DR: The authors use machine learning to learn weights that relate textual features describing the provided input-output examples to plausible sub-components of a program, improving search and ranking on a variety of text processing tasks found on help forums.
Abstract: Learning programs is a timely and interesting challenge. In Programming by Example (PBE), a system attempts to infer a program from input and output examples alone, by searching for a composition of some set of base functions. We show how machine learning can be used to speed up this seemingly hopeless search problem, by learning weights that relate textual features describing the provided input-output examples to plausible sub-components of a program. This generic learning framework lets us address problems beyond the scope of earlier PBE systems. Experiments on a prototype implementation show that learning improves search and ranking on a variety of text processing tasks found on help forums.

Proceedings Article
Yuxin Chen1, Andreas Krause1
16 Jun 2013
TL;DR: It is proved that for batch mode active learning and more general information-parallel stochastic optimization problems that exhibit adaptive submodularity, a natural diminishing returns condition, a simple greedy strategy is competitive with the optimal batch-mode policy.
Abstract: Active learning can lead to a dramatic reduction in labeling effort. However, in many practical implementations (such as crowdsourcing, surveys, high-throughput experimental design), it is preferable to query labels for batches of examples to be labelled in parallel. While several heuristics have been proposed for batch-mode active learning, little is known about their theoretical performance. We consider batch mode active learning and more general information-parallel stochastic optimization problems that exhibit adaptive submodularity, a natural diminishing returns condition. We prove that for such problems, a simple greedy strategy is competitive with the optimal batch-mode policy. In some cases, surprisingly, the use of batches incurs competitively low cost, even when compared to a fully sequential strategy. We demonstrate the effectiveness of our approach on batch-mode active learning tasks, where it outperforms the state of the art, as well as the novel problem of multi-stage influence maximization in social networks.