scispace - formally typeset
Search or ask a question

Showing papers on "Active learning (machine learning) published in 2012"


Book
25 Mar 2012
TL;DR: A modern overview of online learning is provided to give the reader a sense of some of the interesting ideas and in particular to underscore the centrality of convexity in deriving efficient online learning algorithms.
Abstract: Online learning is a well established learning paradigm which has both theoretical and practical appeals. The goal of online learning is to make a sequence of accurate predictions given knowledge of the correct answer to previous prediction tasks and possibly additional available information. Online learning has been studied in several research fields including game theory, information theory, and machine learning. It also became of great interest to practitioners due the recent emergence of large scale applications such as online advertisement placement and online web ranking. In this survey we provide a modern overview of online learning. Our goal is to give the reader a sense of some of the interesting ideas and in particular to underscore the centrality of convexity in deriving efficient online learning algorithms. We do not mean to be comprehensive but rather to give a high-level, rigorous yet easy to follow, survey.

1,935 citations


Journal ArticleDOI
Xinwei Deng1
TL;DR: Experimental design is reviewed here for broad classes of data collection and analysis problems, including: fractioning techniques based on orthogonal arrays, Latin hypercube designs and their variants for computer experimentation, efficient design for data mining and machine learning applications, and sequential design for active learning.
Abstract: Maximizing data information requires careful selection, termed design, of the points at which data are observed. Experimental design is reviewed here for broad classes of data collection and analysis problems, including: fractioning techniques based on orthogonal arrays, Latin hypercube designs and their variants for computer experimentation, efficient design for data mining and machine learning applications, and sequential design for active learning. © 2012 Wiley Periodicals, Inc. © 2012 Wiley Periodicals, Inc.

1,025 citations


Book
01 Sep 2012
TL;DR: Peter Flach's clear, example-based approach begins by discussing how a spam filter works, which gives an immediate introduction to machine learning in action, with a minimum of technical fuss.
Abstract: As one of the most comprehensive machine learning texts around, this book does justice to the field's incredible richness, but without losing sight of the unifying principles. Peter Flach's clear, example-based approach begins by discussing how a spam filter works, which gives an immediate introduction to machine learning in action, with a minimum of technical fuss. Flach provides case studies of increasing complexity and variety with well-chosen examples and illustrations throughout. He covers a wide range of logical, geometric and statistical models and state-of-the-art topics such as matrix factorisation and ROC analysis. Particular attention is paid to the central role played by features. The use of established terminology is balanced with the introduction of new and useful concepts, and summaries of relevant background material are provided with pointers for revision if necessary. These features ensure Machine Learning will set a new standard as an introductory textbook.

772 citations


Proceedings ArticleDOI
16 Apr 2012
TL;DR: This work presents an efficient approach to relational learning on LOD data, based on the factorization of a sparse tensor that scales to data consisting of millions of entities, hundreds of relations and billions of known facts, and shows how ontological knowledge can be incorporated in the factorizations to improve learning results and how computation can be distributed across multiple nodes.
Abstract: Vast amounts of structured information have been published in the Semantic Web's Linked Open Data (LOD) cloud and their size is still growing rapidly. Yet, access to this information via reasoning and querying is sometimes difficult, due to LOD's size, partial data inconsistencies and inherent noisiness. Machine Learning offers an alternative approach to exploiting LOD's data with the advantages that Machine Learning algorithms are typically robust to both noise and data inconsistencies and are able to efficiently utilize non-deterministic dependencies in the data. From a Machine Learning point of view, LOD is challenging due to its relational nature and its scale. Here, we present an efficient approach to relational learning on LOD data, based on the factorization of a sparse tensor that scales to data consisting of millions of entities, hundreds of relations and billions of known facts. Furthermore, we show how ontological knowledge can be incorporated in the factorization to improve learning results and how computation can be distributed across multiple nodes. We demonstrate that our approach is able to factorize the YAGO~2 core ontology and globally predict statements for this large knowledge base using a single dual-core desktop computer. Furthermore, we show experimentally that our approach achieves good results in several relational learning tasks that are relevant to Linked Data. Once a factorization has been computed, our model is able to predict efficiently, and without any additional training, the likelihood of any of the 4.3 ⋅ 1014 possible triples in the YAGO~2 core ontology.

430 citations


Proceedings ArticleDOI
28 Jan 2012
TL;DR: The ongoing development of an end-to-end interactive machine learning system at the Tufts Evidence-based Practice Center is described and abstrackr, an online tool for the task of citation screening for systematic reviews is developed, which provides an interface to the machine learning methods.
Abstract: Medical researchers looking for evidence pertinent to a specific clinical question must navigate an increasingly voluminous corpus of published literature. This data deluge has motivated the development of machine learning and data mining technologies to facilitate efficient biomedical research. Despite the obvious labor-saving potential of these technologies and the concomitant academic interest therein, however, adoption of machine learning techniques by medical researchers has been relatively sluggish. One explanation for this is that while many machine learning methods have been proposed and retrospectively evaluated, they are rarely (if ever) actually made accessible to the practitioners whom they would benefit. In this work, we describe the ongoing development of an end-to-end interactive machine learning system at the Tufts Evidence-based Practice Center. More specifically, we have developed abstrackr, an online tool for the task of citation screening for systematic reviews. This tool provides an interface to our machine learning methods. The main aim of this work is to provide a case study in deploying cutting-edge machine learning methods that will actually be used by experts in a clinical research setting.

398 citations


Book
30 Mar 2012
TL;DR: This book focuses on a specific non-stationary environment known as covariate shift, in which the distributions of inputs and outputs change but the conditional distribution of outputs is unchanged, and presents machine learning theory, algorithms, and applications to overcome this variety of non- stationarity.
Abstract: As the power of computing has grown over the past few decades, the field of machine learning has advanced rapidly in both theory and practice. Machine learning methods are usually based on the assumption that the data generation mechanism does not change over time. Yet real-world applications of machine learning, including image recognition, natural language processing, speech recognition, robot control, and bioinformatics, often violate this common assumption. Dealing with non-stationarity is one of modern machine learning's greatest challenges. This book focuses on a specific non-stationary environment known as covariate shift, in which the distributions of inputs (queries) change but the conditional distribution of outputs (answers) is unchanged, and presents machine learning theory, algorithms, and applications to overcome this variety of non-stationarity. After reviewing the state-of-the-art research in the field, the authors discuss topics that include learning under covariate shift, model selection, importance estimation, and active learning. They describe such real world applications of covariate shift adaption as brain-computer interface, speaker identification, and age prediction from facial images. With this book, they aim to encourage future research in machine learning, statistics, and engineering that strives to create truly autonomous learning machines able to learn under non-stationarity.

384 citations


Book ChapterDOI
15 Jul 2012
TL;DR: This chapter presents an overview of machine learning techniques in time series forecasting by focusing on three aspects: the formalization of one- step forecasting problems as supervised learning tasks, the discussion of local learning techniques as an effective tool for dealing with temporal data and the role of the forecasting strategy when the authors move from one-step to multiple-step forecasting.
Abstract: The increasing availability of large amounts of historical data and the need of performing accurate forecasting of future behavior in several scientific and applied domains demands the definition of robust and efficient techniques able to infer from observations the stochastic dependency between past and future. The forecasting domain has been influenced, from the 1960s on, by linear statistical methods such as ARIMA models. More recently, machine learning models have drawn attention and have established themselves as serious contenders to classical statistical models in the forecasting community. This chapter presents an overview of machine learning techniques in time series forecasting by focusing on three aspects: the formalization of one-step forecasting problems as supervised learning tasks, the discussion of local learning techniques as an effective tool for dealing with temporal data and the role of the forecasting strategy when we move from one-step to multiple-step forecasting.

367 citations


Posted Content
TL;DR: This paper proposes a regularization formulation for learning the relationships between tasks in multi-task learning, called MTRL, which can also describe negative task correlation and identify outlier tasks based on the same underlying principle.
Abstract: Multi-task learning is a learning paradigm which seeks to improve the generalization performance of a learning task with the help of some other related tasks. In this paper, we propose a regularization formulation for learning the relationships between tasks in multi-task learning. This formulation can be viewed as a novel generalization of the regularization framework for single-task learning. Besides modeling positive task correlation, our method, called multi-task relationship learning (MTRL), can also describe negative task correlation and identify outlier tasks based on the same underlying principle. Under this regularization framework, the objective function of MTRL is convex. For efficiency, we use an alternating method to learn the optimal model parameters for each task as well as the relationships between tasks. We study MTRL in the symmetric multi-task learning setting and then generalize it to the asymmetric setting as well. We also study the relationships between MTRL and some existing multi-task learning methods. Experiments conducted on a toy problem as well as several benchmark data sets demonstrate the effectiveness of MTRL.

363 citations


Journal ArticleDOI
TL;DR: The proposed method incorporates the voting method into the popular extreme learning machine (ELM) in classification applications and generally outperforms the original ELM algorithm as well as several recent classification algorithms.

329 citations


Proceedings Article
01 Jan 2012
TL;DR: This work develops an algorithm which adaptively selects survey nodes by estimating which form of smoothness is most appropriate, and evaluates its algorithm on several network datasets and demonstrates its improvements over standard active learning methods.
Abstract: In network classification problems such as those found in intelligence gathering, public health, and viral marketing, one is often only interested in inferring the labels of a subset of the nodes. We refer to this subset as the query set, and define the problem as query-driven collective classification. We study this problem in a practical active learning framework, in which the learning algorithm can survey non-query nodes to obtain their labels and network structure. We derive a surveying strategy aimed toward optimal inference on the query set. Considering both feature and structural smoothness, concepts that we formally define, we develop an algorithm which adaptively selects survey nodes by estimating which form of smoothness is most appropriate. We evaluate our algorithm on several network datasets and demonstrate its improvements over standard active learning methods.

296 citations


Book ChapterDOI
01 Jan 2012
TL;DR: This text introduces the intuitions and concepts behind Markov decision processes and two classes of algorithms for computing optimal behaviors: reinforcement learning and dynamic programming, and surveys efficient extensions of the foundational algorithms.
Abstract: Situated in between supervised learning and unsupervised learning, the paradigm of reinforcement learning deals with learning in sequential decision making problems in which there is limited feedback. This text introduces the intuitions and concepts behind Markov decision processes and two classes of algorithms for computing optimal behaviors: reinforcement learning and dynamic programming. First the formal framework of Markov decision process is defined, accompanied by the definition of value functions and policies. The main part of this text deals with introducing foundational classes of algorithms for learning optimal behaviors, based on various definitions of optimality with respect to the goal of learning sequential decisions. Additionally, it surveys efficient extensions of the foundational algorithms, differing mainly in the way feedback given by the environment is used to speed up learning, and in the way they concentrate on relevant parts of the problem. For both model-based and model-free settings these efficient extensions have shown useful in scaling up to larger problems.

Proceedings Article
01 Jan 2012
TL;DR: This paper is a brief introduction to the special session on interpretable models in machine learning, organized as part of the 20 th European Symposium on Artificial Neural Networks, Computational In- telligence and Machine Learning, with an overview of the context of wider research on interpretability of machine learning models.
Abstract: Data of different levels of complexity and of ever growing diversity of characteristics are the raw materials that machine learning practitioners try to model using their wide palette of methods and tools. The obtained models are meant to be a synthetic representation of the available, observed data that captures some of their intrinsic regularities or patterns. Therefore, the use of machine learning techniques for data analysis can be understood as a problem of pattern recognition or, more informally, of knowledge discovery and data mining. There exists a gap, though, between data modeling and knowledge extraction. Models, de- pending on the machine learning techniques employed, can be described in diverse ways but, in order to consider that some knowledge has been achieved from their description, we must take into account the human cog- nitive factor that any knowledge extraction process entails. These models as such can be rendered powerless unless they can be interpreted ,a nd the process of human interpretation follows rules that go well beyond techni- cal prowess. For this reason, interpretability is a paramount quality that machine learning methods should aim to achieve if they are to be applied in practice. This paper is a brief introduction to the special session on interpretable models in machine learning, organized as part of the 20 th European Symposium on Artificial Neural Networks, Computational In- telligence and Machine Learning. It includes a discussion on the several works accepted for the session, with an overview of the context of wider research on interpretability of machine learning models.

Journal ArticleDOI
01 Nov 2012
TL;DR: A semi-supervised learning method where the user actively assists in the co-analysis by iteratively providing inputs that progressively constrain the system, which introduces a novel constrained clustering method which embeds elements to better respect their inter-distances in feature space together with the user-given set of constraints.
Abstract: Unsupervised co-analysis of a set of shapes is a difficult problem since the geometry of the shapes alone cannot always fully describe the semantics of the shape parts. In this paper, we propose a semi-supervised learning method where the user actively assists in the co-analysis by iteratively providing inputs that progressively constrain the system. We introduce a novel constrained clustering method based on a spring system which embeds elements to better respect their inter-distances in feature space together with the user-given set of constraints. We also present an active learning method that suggests to the user where his input is likely to be the most effective in refining the results. We show that each single pair of constraints affects many relations across the set. Thus, the method requires only a sparse set of constraints to quickly converge toward a consistent and error-free semantic labeling of the set.

Proceedings ArticleDOI
10 Jun 2012
TL;DR: A learning architecture, that is able to do reinforcement learning based on raw visual input data, and the resulting policy, learned only by success or failure, is hardly beaten by an experienced human player.
Abstract: We propose a learning architecture, that is able to do reinforcement learning based on raw visual input data. In contrast to previous approaches, not only the control policy is learned. In order to be successful, the system must also autonomously learn, how to extract relevant information out of a high-dimensional stream of input information, for which the semantics are not provided to the learning system. We give a first proof-of-concept of this novel learning architecture on a challenging benchmark, namely visual control of a racing slot car. The resulting policy, learned only by success or failure, is hardly beaten by an experienced human player.

Journal ArticleDOI
TL;DR: Simulations have shown that SaE-ELM not only performs better than E- ELM with several manually choosing generation strategies and control parameters but also obtains better generalization performances than several related methods.
Abstract: In this paper, we propose an improved learning algorithm named self-adaptive evolutionary extreme learning machine (SaE-ELM) for single hidden layer feedforward networks (SLFNs). In SaE-ELM, the network hidden node parameters are optimized by the self-adaptive differential evolution algorithm, whose trial vector generation strategies and their associated control parameters are self-adapted in a strategy pool by learning from their previous experiences in generating promising solutions, and the network output weights are calculated using the Moore–Penrose generalized inverse. SaE-ELM outperforms the evolutionary extreme learning machine (E-ELM) and the different evolutionary Levenberg–Marquardt method in general as it could self-adaptively determine the suitable control parameters and generation strategies involved in DE. Simulations have shown that SaE-ELM not only performs better than E-ELM with several manually choosing generation strategies and control parameters but also obtains better generalization performances than several related methods.

Journal ArticleDOI
TL;DR: This paper shows how to perform parameter learning in an unsupervised fashion, that is when no correct correspondences between graphs are given during training, and reveals that unsuper supervised learning compares favorably to the supervised case, both in terms of efficiency and quality.
Abstract: Graph matching is an essential problem in computer vision that has been successfully applied to 2D and 3D feature matching and object recognition. Despite its importance, little has been published on learning the parameters that control graph matching, even though learning has been shown to be vital for improving the matching rate. In this paper we show how to perform parameter learning in an unsupervised fashion, that is when no correct correspondences between graphs are given during training. Our experiments reveal that unsupervised learning compares favorably to the supervised case, both in terms of efficiency and quality, while avoiding the tedious manual labeling of ground truth correspondences. We verify experimentally that our learning method can improve the performance of several state-of-the art graph matching algorithms. We also show that a similar method can be successfully applied to parameter learning for graphical models and demonstrate its effectiveness empirically.

Proceedings ArticleDOI
Jimmy Lin1, Alek Kolcz1
20 May 2012
TL;DR: A case study of Twitter's integration of machine learning tools into its existing Hadoop-based, Pig-centric analytics platform to provide predictive analytics capabilities that incorporate machine learning, focused specifically on supervised classification.
Abstract: The success of data-driven solutions to difficult problems, along with the dropping costs of storing and processing massive amounts of data, has led to growing interest in large-scale machine learning. This paper presents a case study of Twitter's integration of machine learning tools into its existing Hadoop-based, Pig-centric analytics platform. We begin with an overview of this platform, which handles "traditional" data warehousing and business intelligence tasks for the organization. The core of this work lies in recent Pig extensions to provide predictive analytics capabilities that incorporate machine learning, focused specifically on supervised classification. In particular, we have identified stochastic gradient descent techniques for online learning and ensemble methods as being highly amenable to scaling out to large amounts of data. In our deployed solution, common machine learning tasks such as data sampling, feature generation, training, and testing can be accomplished directly in Pig, via carefully crafted loaders, storage functions, and user-defined functions. This means that machine learning is just another Pig script, which allows seamless integration with existing infrastructure for data management, scheduling, and monitoring in a production environment, as well as access to rich libraries of user-defined functions and the materialized output of other scripts.

Journal ArticleDOI
TL;DR: In this paper, the authors study the connections between metric learning and kernel learning that arise when studying metric learning as a linear transformation learning problem and propose a general optimization framework for learning metrics via linear transformations, and analyze a special case of their framework-that of minimizing the LogDet divergence subject to linear constraints.
Abstract: Metric and kernel learning arise in several machine learning applications. However, most existing metric learning algorithms are limited to learning metrics over low-dimensional data, while existing kernel learning algorithms are often limited to the transductive setting and do not generalize to new data points. In this paper, we study the connections between metric learning and kernel learning that arise when studying metric learning as a linear transformation learning problem. In particular, we propose a general optimization framework for learning metrics via linear transformations, and analyze in detail a special case of our framework-that of minimizing the LogDet divergence subject to linear constraints. We then propose a general regularized framework for learning a kernel matrix, and show it to be equivalent to our metric learning framework. Our theoretical connections between metric and kernel learning have two main consequences: 1) the learned kernel matrix parameterizes a linear transformation kernel function and can be applied inductively to new data points, 2) our result yields a constructive method for kernelizing most existing Mahalanobis metric learning formulations. We demonstrate our learning approach by applying it to large-scale real world problems in computer vision, text mining and semi-supervised kernel dimensionality reduction.

Journal ArticleDOI
TL;DR: The experimental results show that FOS-ELM has higher accuracy with fewer training time, better stability and short-term predictability than EOS- ELM.

Journal ArticleDOI
TL;DR: This brief proposes a new learning algorithm, called bidirectional extreme learning machine (B-ELM), in which some hidden nodes are not randomly selected, and tends to reduce network output error to 0 at an extremely early learning stage, which can be tens to hundreds of times faster than other incremental ELM algorithms.
Abstract: It is clear that the learning effectiveness and learning speed of neural networks are in general far slower than required, which has been a major bottleneck for many applications. Recently, a simple and efficient learning method, referred to as extreme learning machine (ELM), was proposed by Huang , which has shown that, compared to some conventional methods, the training time of neural networks can be reduced by a thousand times. However, one of the open problems in ELM research is whether the number of hidden nodes can be further reduced without affecting learning effectiveness. This brief proposes a new learning algorithm, called bidirectional extreme learning machine (B-ELM), in which some hidden nodes are not randomly selected. In theory, this algorithm tends to reduce network output error to 0 at an extremely early learning stage. Furthermore, we find a relationship between the network output error and the network output weights in the proposed B-ELM. Simulation results demonstrate that the proposed method can be tens to hundreds of times faster than other incremental ELM algorithms.

Journal ArticleDOI
TL;DR: Students’ key demographic characteristics and their marks in a small number of written assignments can constitute the training set for a regression method in order to predict the student’s performance.
Abstract: Use of machine learning techniques for educational proposes (or educational data mining) is an emerging field aimed at developing methods of exploring data from computational educational settings and discovering meaningful patterns. The stored data (virtual courses, e-learning log file, demographic and academic data of students, admissions/registration info, and so on) can be useful for machine learning algorithms. In this article, we cite the most current articles that use machine learning techniques for educational proposes and we present a case study for predicting students' marks. Students' key demographic characteristics and their marks in a small number of written assignments can constitute the training set for a regression method in order to predict the student's performance. Finally, a prototype version of software support tool for tutors has been constructed.

Journal ArticleDOI
TL;DR: A novel active learning algorithm which is performed in the data manifold adaptive kernel space by using graph Laplacian, which can select the most representative and discriminative data points for labeling.
Abstract: In many information processing tasks, labels are usually expensive and the unlabeled data points are abundant. To reduce the cost on collecting labels, it is crucial to predict which unlabeled examples are the most informative, i.e., improve the classifier the most if they were labeled. Many active learning techniques have been proposed for text categorization, such as SVMActive and Transductive Experimental Design. However, most of previous approaches try to discover the discriminant structure of the data space, whereas the geometrical structure is not well respected. In this paper, we propose a novel active learning algorithm which is performed in the data manifold adaptive kernel space. The manifold structure is incorporated into the kernel space by using graph Laplacian. This way, the manifold adaptive kernel space reflects the underlying geometry of the data. By minimizing the expected error with respect to the optimal classifier, we can select the most representative and discriminative data points for labeling. Experimental results on text categorization have demonstrated the effectiveness of our proposed approach.

Book
11 Jul 2012
TL;DR: The Explanation-Based Neural Network Learning (EBNN) as discussed by the authors is a machine learning algorithm that transfers knowledge across multiple learning tasks when faced with a new learning task, EBNN exploits domain knowledge accumulated in previous learning tasks to guide generalization in the new one.
Abstract: From the Publisher: Lifelong learning addresses situations in which a learner faces a series of different learning tasks providing the opportunity for synergy among them Explanation-based neural network learning (EBNN) is a machine learning algorithm that transfers knowledge across multiple learning tasks When faced with a new learning task, EBNN exploits domain knowledge accumulated in previous learning tasks to guide generalization in the new one As a result, EBNN generalizes more accurately from less data than comparable methods Explanation-Based Neural Network Learning: A Lifelong Learning Approach describes the basic EBNN paradigm and investigates it in the context of supervised learning, reinforcement learning, robotics, and chess

Journal ArticleDOI
João Gama1
TL;DR: The fundamental issues in learning in dynamic environments like continuously maintain learning models that evolve over time, learning and forgetting, concept drift and change detection, and challenges that emerge in learning from data streams are discussed.
Abstract: Nowadays, there are applications in which the data are modeled best not as persistent tables, but rather as transient data streams. In this article, we discuss the limitations of current machine learning and data mining algorithms. We discuss the fundamental issues in learning in dynamic environments like continuously maintain learning models that evolve over time, learning and forgetting, concept drift and change detection. Data streams produce a huge amount of data that introduce new constraints in the design of learning algorithms: limited computational resources in terms of memory, cpu power, and communication bandwidth. We present some illustrative algorithms, designed to taking these constrains into account, for decision-tree learning, hierarchical clustering and frequent pattern mining. We identify the main issues and current challenges that emerge in learning from data streams that open research lines for further developments.

Journal ArticleDOI
TL;DR: A novel active learning approach based on the optimum experimental design criteria in statistics is proposed that simultaneously exploits sample's local structure, and sample relevance, density, and diversity information, as well as makes use of labeled and unlabeled data.
Abstract: Video indexing, also called video concept detection, has attracted increasing attentions from both academia and industry. To reduce human labeling cost, active learning has been introduced to video indexing recently. In this paper, we propose a novel active learning approach based on the optimum experimental design criteria in statistics. Different from existing optimum experimental design, our approach simultaneously exploits sample's local structure, and sample relevance, density, and diversity information, as well as makes use of labeled and unlabeled data. Specifically, we develop a local learning model to exploit the local structure of each sample. Our assumption is that for each sample, its label can be well estimated based on its neighbors. By globally aligning the local models from all the samples, we obtain a local learning regularizer, based on which a local learning regularized least square model is proposed. Finally, a unified sample selection approach is developed for interactive video indexing, which takes into account the sample relevance, density and diversity information, and sample efficacy in minimizing the parameter variance of the proposed local learning regularized least square model. We compare the performance between our approach and the state-of-the-art approaches on the TREC video retrieval evaluation (TRECVID) benchmark. We report superior performance from the proposed approach.

Journal ArticleDOI
TL;DR: A new interaction modality for training which requires only yes-no type binary feedback instead of a precise category label is proposed, which is especially powerful in the presence of hundreds of categories.
Abstract: Machine learning techniques for computer vision applications like object recognition, scene classification, etc., require a large number of training samples for satisfactory performance. Especially when classification is to be performed over many categories, providing enough training samples for each category is infeasible. This paper describes new ideas in multiclass active learning to deal with the training bottleneck, making it easier to train large multiclass image classification systems. First, we propose a new interaction modality for training which requires only yes-no type binary feedback instead of a precise category label. The modality is especially powerful in the presence of hundreds of categories. For the proposed modality, we develop a Value-of-Information (VOI) algorithm that chooses informative queries while also considering user annotation cost. Second, we propose an active selection measure that works with many categories and is extremely fast to compute. This measure is employed to perform a fast seed search before computing VOI, resulting in an algorithm that scales linearly with dataset size. Third, we use locality sensitive hashing to provide a very fast approximation to active learning, which gives sublinear time scaling, allowing application to very large datasets. The approximation provides up to two orders of magnitude speedups with little loss in accuracy. Thorough empirical evaluation of classification accuracy, noise sensitivity, imbalanced data, and computational performance on a diverse set of image datasets demonstrates the strengths of the proposed algorithms.

Book ChapterDOI
07 Oct 2012
TL;DR: This paper contributes an unstructured social activity attribute (USAA) dataset, introduces the concept of semi-latent attribute space, and proposes a novel model for learning the latent attributes which alleviate the dependence of existing models on exact and exhaustive manual specification of the attribute-space.
Abstract: The rapid development of social video sharing platforms has created a huge demand for automatic video classification and annotation techniques, in particular for videos containing social activities of a group of people (e.g. YouTube video of a wedding reception). Recently, attribute learning has emerged as a promising paradigm for transferring learning to sparsely labelled classes in object or single-object short action classification. In contrast to existing work, this paper for the first time, tackles the problem of attribute learning for understanding group social activities with sparse labels. This problem is more challenging because of the complex multi-object nature of social activities, and the unstructured nature of the activity context. To solve this problem, we (1) contribute an unstructured social activity attribute (USAA) dataset with both visual and audio attributes, (2) introduce the concept of semi-latent attribute space and (3) propose a novel model for learning the latent attributes which alleviate the dependence of existing models on exact and exhaustive manual specification of the attribute-space. We show that our framework is able to exploit latent attributes to outperform contemporary approaches for addressing a variety of realistic multi-media sparse data learning tasks including: multi-task learning, N-shot transfer learning, learning with label noise and importantly zero-shot learning.

Proceedings ArticleDOI
16 Jun 2012
TL;DR: A novel active learning method specifically suited for semantic segmentation of pixels in an image according to the semantic class it belongs to, model the problem as a pairwise CRF and cast active learning as finding its most informative nodes.
Abstract: We address the problem of semantic segmentation: classifying each pixel in an image according to the semantic class it belongs to (e.g. dog, road, car). Most existing methods train from fully supervised images, where each pixel is annotated by a class label. To reduce the annotation effort, recently a few weakly supervised approaches emerged. These require only image labels indicating which classes are present. Although their performance reaches a satisfactory level, there is still a substantial gap between the accuracy of fully and weakly supervised methods. We address this gap with a novel active learning method specifically suited for this setting. We model the problem as a pairwise CRF and cast active learning as finding its most informative nodes. These nodes induce the largest expected change in the overall CRF state, after revealing their true label. Our criterion is equivalent to maximizing an upper-bound on accuracy gain. Experiments on two data-sets show that our method achieves 97% percent of the accuracy of the corresponding fully supervised model, while querying less than 17% of the (super-)pixel labels.

Patent
04 Jun 2012
TL;DR: In this article, a generalized learning framework is proposed to enable adaptive signal processing system to flexibly combine different learning rules with different methods (online or batch learning), where learning tasks are separated from control tasks, so that changes in one module do not necessitate changes within the other.
Abstract: Generalized learning rules may be implemented. A framework may be used to enable adaptive signal processing system to flexibly combine different learning rules (supervised, unsupervised, reinforcement learning) with different methods (online or batch learning). The generalized learning framework may employ average performance function as the learning measure thereby enabling modular architecture where learning tasks are separated from control tasks, so that changes in one of the modules do not necessitate changes within the other. Separation of learning tasks from the control tasks implementations may allow dynamic reconfiguration of the learning block in response to a task change or learning method change in real time. The generalized learning apparatus may be capable of implementing several learning rules concurrently based on the desired control application and without requiring users to explicitly identify the required learning rule composition for that application.

Proceedings ArticleDOI
12 Aug 2012
TL;DR: It is demonstrated that it is possible to develop a much more resilient SVM learning model while making loose assumptions on the data corruption models and that optimal solutions may be overly pessimistic when the actual attacks are much weaker than expected.
Abstract: Many learning tasks such as spam filtering and credit card fraud detection face an active adversary that tries to avoid detection. For learning problems that deal with an active adversary, it is important to model the adversary's attack strategy and develop robust learning models to mitigate the attack. These are the two objectives of this paper. We consider two attack models: a free-range attack model that permits arbitrary data corruption and a restrained attack model that anticipates more realistic attacks that a reasonable adversary would devise under penalties. We then develop optimal SVM learning strategies against the two attack models. The learning algorithms minimize the hinge loss while assuming the adversary is modifying data to maximize the loss. Experiments are performed on both artificial and real data sets. We demonstrate that optimal solutions may be overly pessimistic when the actual attacks are much weaker than expected. More important, we demonstrate that it is possible to develop a much more resilient SVM learning model while making loose assumptions on the data corruption models. When derived under the restrained attack model, our optimal SVM learning strategy provides more robust overall performance under a wide range of attack parameters.