scispace - formally typeset
Search or ask a question

Showing papers on "Active learning (machine learning) published in 2010"


Journal ArticleDOI
TL;DR: The relationship between transfer learning and other related machine learning techniques such as domain adaptation, multitask learning and sample selection bias, as well as covariate shift are discussed.
Abstract: A major assumption in many machine learning and data mining algorithms is that the training and future data must be in the same feature space and have the same distribution. However, in many real-world applications, this assumption may not hold. For example, we sometimes have a classification task in one domain of interest, but we only have sufficient training data in another domain of interest, where the latter data may be in a different feature space or follow a different data distribution. In such cases, knowledge transfer, if done successfully, would greatly improve the performance of learning by avoiding much expensive data-labeling efforts. In recent years, transfer learning has emerged as a new learning framework to address this problem. This survey focuses on categorizing and reviewing the current progress on transfer learning for classification, regression, and clustering problems. In this survey, we discuss the relationship between transfer learning and other related machine learning techniques such as domain adaptation, multitask learning and sample selection bias, as well as covariate shift. We also explore some potential future issues in transfer learning research.

18,616 citations


Proceedings Article
06 Dec 2010
TL;DR: A novel, iterative self-paced learning algorithm where each iteration simultaneously selects easy samples and learns a new parameter vector that outperforms the state of the art method for learning a latent structural SVM on four applications.
Abstract: Latent variable models are a powerful tool for addressing several tasks in machine learning. However, the algorithms for learning the parameters of latent variable models are prone to getting stuck in a bad local optimum. To alleviate this problem, we build on the intuition that, rather than considering all samples simultaneously, the algorithm should be presented with the training data in a meaningful order that facilitates learning. The order of the samples is determined by how easy they are. The main challenge is that often we are not provided with a readily computable measure of the easiness of samples. We address this issue by proposing a novel, iterative self-paced learning algorithm where each iteration simultaneously selects easy samples and learns a new parameter vector. The number of samples selected is governed by a weight that is annealed until the entire training data has been considered. We empirically demonstrate that the self-paced learning algorithm outperforms the state of the art method for learning a latent structural SVM on four applications: object localization, noun phrase coreference, motif finding and handwritten digit recognition.

1,220 citations


Posted Content
TL;DR: In this article, a no-regret algorithm is proposed to train a stationary deterministic policy with good performance under the distribution of observations it induces in such sequential settings, and it outperforms previous approaches on two challenging imitation learning problems and a benchmark sequence labeling problem.
Abstract: Sequential prediction problems such as imitation learning, where future observations depend on previous predictions (actions), violate the common i.i.d. assumptions made in statistical learning. This leads to poor performance in theory and often in practice. Some recent approaches provide stronger guarantees in this setting, but remain somewhat unsatisfactory as they train either non-stationary or stochastic policies and require a large number of iterations. In this paper, we propose a new iterative algorithm, which trains a stationary deterministic policy, that can be seen as a no regret algorithm in an online learning setting. We show that any such no regret algorithm, combined with additional reduction assumptions, must find a policy with good performance under the distribution of observations it induces in such sequential settings. We demonstrate that this new approach outperforms previous approaches on two challenging imitation learning problems and a benchmark sequence labeling problem.

1,176 citations


Proceedings Article
06 Dec 2010
TL;DR: The proposed QUIRE approach provides a systematic way for measuring and combining the informativeness and representativeness of an unlabeled instance by incorporating the correlation among labels and is extended to multi-label learning by actively querying instance-label pairs.
Abstract: Most active learning approaches select either informative or representative unlabeled instances to query their labels. Although several active learning algorithms have been proposed to combine the two criteria for query selection, they are usually ad hoc in finding unlabeled instances that are both informative and representative. We address this challenge by a principled approach, termed QUIRE, based on the min-max view of active learning. The proposed approach provides a systematic way for measuring and combining the informativeness and representativeness of an instance. Extensive experimental results show that the proposed QUIRE approach outperforms several state-of -the-art active learning approaches.

518 citations


Journal Article
TL;DR: This paper presents a mature, flexible, and adaptive machine learning toolkit for regression modeling and active learning to tackle issues of computational cost and model accuracy.
Abstract: An exceedingly large number of scientific and engineering fields are confronted with the need for computer simulations to study complex, real world phenomena or solve challenging design problems. However, due to the computational cost of these high fidelity simulations, the use of neural networks, kernel methods, and other surrogate modeling techniques have become indispensable. Surrogate models are compact and cheap to evaluate, and have proven very useful for tasks such as optimization, design space exploration, prototyping, and sensitivity analysis. Consequently, in many fields there is great interest in tools and techniques that facilitate the construction of such regression models, while minimizing the computational cost and maximizing model accuracy. This paper presents a mature, flexible, and adaptive machine learning toolkit for regression modeling and active learning to tackle these issues. The toolkit brings together algorithms for data fitting, model selection, sample selection (active learning), hyperparameter optimization, and distributed computing in order to empower a domain expert to efficiently generate an accurate model for the problem or data at hand.

490 citations


Journal ArticleDOI
TL;DR: This paper aims to clarify the use of alternative MI assumptions by reviewing the work done in this area, and focuses on a relaxed view of the MI problem, where the standard MI assumption is dropped and alternative assumptions are considered instead.
Abstract: Multi-instance (MI) learning is a variant of inductive machine learning, where each learning example contains a bag of instances instead of a single feature vector. The term commonly refers to the supervised setting, where each bag is associated with a label. This type of representation is a natural fit for a number of real-world learning scenarios, including drug activity prediction and image classification, hence many MI learning algorithms have been proposed. An yM I learning method must relate instances to bag-level class labels, but many types of relationships between instances and class labels are possible. Although all early work in MI learning assumes a specific MI concept class known to be appropriate for a drug activity prediction domain; this ‘standard MI assumption’ is not guaranteed to hold in other domains. Much of the recent work in MI learning has concentrated on a relaxed view of the MI problem, where the standard MI assumption is dropped, and alternative assumptions are considered instead. However, often it is not clearly stated what particular assumption is used and how it relates to other assumptions that have been proposed. In this paper, we aim to clarify the use of alternative MI assumptions by reviewing the work done in this area.

378 citations


Proceedings Article
06 Dec 2010
TL;DR: This paper proposes an alternative formulation for multi-task learning by extending the recently published large margin nearest neighbor (1mnn) algorithm to the MTL paradigm and shows that it consistently outperforms single-task kNN under several metrics and state-of-the-art MTL classifiers.
Abstract: Multi-task learning (MTL) improves the prediction performance on multiple, different but related, learning problems through shared parameters or representations. One of the most prominent multi-task learning algorithms is an extension to support vector machines (svm) by Evgeniou et al. [15]. Although very elegant, multi-task svm is inherently restricted by the fact that support vector machines require each class to be addressed explicitly with its own weight vector which, in a multi-task setting, requires the different learning tasks to share the same set of classes. This paper proposes an alternative formulation for multi-task learning by extending the recently published large margin nearest neighbor (1mnn) algorithm to the MTL paradigm. Instead of relying on separating hyperplanes, its decision function is based on the nearest neighbor rule which inherently extends to many classes and becomes a natural fit for multi-task learning. We evaluate the resulting multi-task 1mnn on real-world insurance data and speech classification problems and show that it consistently outperforms single-task kNN under several metrics and state-of-the-art MTL classifiers.

293 citations


Proceedings ArticleDOI
03 May 2010
TL;DR: This paper derives a novel approach to RL for parameterized control policies based on the framework of stochastic optimal control with path integrals, and believes that this new algorithm, Policy Improvement with Path Integrals (PI2), offers currently one of the most efficient, numerically robust, and easy to implement algorithms for RL in robotics.
Abstract: Reinforcement learning (RL) is one of the most general approaches to learning control. Its applicability to complex motor systems, however, has been largely impossible so far due to the computational difficulties that reinforcement learning encounters in high dimensional continuous state-action spaces. In this paper, we derive a novel approach to RL for parameterized control policies based on the framework of stochastic optimal control with path integrals. While solidly grounded in optimal control theory and estimation theory, the update equations for learning are surprisingly simple and have no danger of numerical instabilities as neither matrix inversions nor gradient learning rates are required. Empirical evaluations demonstrate significant performance improvements over gradient-based policy learning and scalability to high-dimensional control problems. Finally, a learning experiment on a robot dog illustrates the functionality of our algorithm in a real-world scenario. We believe that our new algorithm, Policy Improvement with Path Integrals (PI2), offers currently one of the most efficient, numerically robust, and easy to implement algorithms for RL in robotics.

289 citations


01 Jan 2010

224 citations


Journal ArticleDOI
TL;DR: An ensemble based ELM (EN-ELM) algorithm is proposed where ensemble learning and cross-validation are embedded into the training phase so as to alleviate the overtraining problem and enhance the predictive stability.
Abstract: Extreme learning machine (ELM) was proposed as a new class of learning algorithm for single-hidden layer feedforward neural network (SLFN). To achieve good generalization performance, ELM minimizes training error on the entire training data set, therefore it might suffer from overfitting as the learning model will approximate all training samples well. In this letter, an ensemble based ELM (EN-ELM) algorithm is proposed where ensemble learning and cross-validation are embedded into the training phase so as to alleviate the overtraining problem and enhance the predictive stability. Experimental results on several benchmark databases demonstrate that EN-ELM is robust and efficient for classification.

222 citations


Posted Content
TL;DR: This work investigates fast methods that allow to quickly eliminate variables (features) in supervised learning problems involving a convex loss function and a $l_1$-norm penalty, leading to a potentially substantial reduction in the number of variables prior to running the supervised learning algorithm.
Abstract: We investigate fast methods that allow to quickly eliminate variables (features) in supervised learning problems involving a convex loss function and a $l_1$-norm penalty, leading to a potentially substantial reduction in the number of variables prior to running the supervised learning algorithm. The methods are not heuristic: they only eliminate features that are {\em guaranteed} to be absent after solving the learning problem. Our framework applies to a large class of problems, including support vector machine classification, logistic regression and least-squares. The complexity of the feature elimination step is negligible compared to the typical computational effort involved in the sparse supervised learning problem: it grows linearly with the number of features times the number of examples, with much better count if data is sparse. We apply our method to data sets arising in text classification and observe a dramatic reduction of the dimensionality, hence in computational effort required to solve the learning problem, especially when very sparse classifiers are sought. Our method allows to immediately extend the scope of existing algorithms, allowing us to run them on data sets of sizes that were out of their reach before.

Proceedings ArticleDOI
06 Jun 2010
TL;DR: This work considers the problem of learning a record matching package (classifier) in an active learning setting, and presents new algorithms for this problem that overcome limitations.
Abstract: We consider the problem of learning a record matching package (classifier) in an active learning setting. In active learning, the learning algorithm picks the set of examples to be labeled, unlike more traditional passive learning setting where a user selects the labeled examples. Active learning is important for record matching since manually identifying a suitable set of labeled examples is difficult. Previous algorithms that use active learning for record matching have serious limitations: The packages that they learn lack quality guarantees and the algorithms do not scale to large input sizes. We present new algorithms for this problem that overcome these limitations. Our algorithms are fundamentally different from traditional active learning approaches, and are designed ground up to exploit problem characteristics specific to record matching. We include a detailed experimental evaluation on realworld data demonstrating the effectiveness of our algorithms.

Journal ArticleDOI
TL;DR: In this paper, a review of rule extraction from SVM classifiers is presented, and a comparison of the algorithms' salient features and relative performance as measured by a number of metrics is made.

Journal ArticleDOI
TL;DR: This work shows that with an appropriate combination of kernels a significant boost in classification performance is possible, and indicates the utility of active learning with probabilistic predictive models, especially when the amount of training data labels that may be sought for a category is ultimately very small.
Abstract: Discriminative methods for visual object category recognition are typically non-probabilistic, predicting class labels but not directly providing an estimate of uncertainty. Gaussian Processes (GPs) provide a framework for deriving regression techniques with explicit uncertainty models; we show here how Gaussian Processes with covariance functions defined based on a Pyramid Match Kernel (PMK) can be used for probabilistic object category recognition. Our probabilistic formulation provides a principled way to learn hyperparameters, which we utilize to learn an optimal combination of multiple covariance functions. It also offers confidence estimates at test points, and naturally allows for an active learning paradigm in which points are optimally selected for interactive labeling. We show that with an appropriate combination of kernels a significant boost in classification performance is possible. Further, our experiments indicate the utility of active learning with probabilistic predictive models, especially when the amount of training data labels that may be sought for a category is ultimately very small.

Journal ArticleDOI
TL;DR: In this study, an optimization problem that models the objectives and criteria for determining personalized context-aware ubiquitous learning paths to maximize the learning efficacy for individual students is formulated by taking the meaningfulness of the learning paths and the number of simultaneous visitors to each learning object into account.
Abstract: In a context-aware ubiquitous learning environment, learning systems can detect students' learning behaviors in the real-world with the help of context-aware (sensor) technology; that is, students can be guided to observe or operate real-world objects with personalized support from the digital world. In this study, an optimization problem that models the objectives and criteria for determining personalized context-aware ubiquitous learning paths to maximize the learning efficacy for individual students is formulated by taking the meaningfulness of the learning paths and the number of simultaneous visitors to each learning object into account. Moreover, a Heuristic Algorithm is proposed to find a quality solution. Experimental results from the learning activities conducted in a natural science butterfly-ecology course of an elementary school are also given to depict the benefits of the innovative approach.

Journal ArticleDOI
TL;DR: It is shown that imitation learning has helped significantly to start learning with reasonable initial behavior, however, many applications are still restricted to rather lowdimensional domains and toy applications.
Abstract: Recent trends in robot learning are to use trajectory-based optimal control techniques and reinforcement learning to scale complex robotic systems. On the one hand, increased computational power and multiprocessing, and on the other hand, probabilistic reinforcement learning methods and function approximation, have contributed to a steadily increasing interest in robot learning. Imitation learning has helped significantly to start learning with reasonable initial behavior. However, many applications are still restricted to rather lowdimensional domains and toy applications. Future work will have to demonstrate the continual and autonomous learning abilities, which were alluded to in the introduction.

Proceedings Article
06 Dec 2010
TL;DR: In this article, a greedy active learning algorithm called EC2 was proposed for Bayesian active learning with noisy observations, and it was shown that it is competitive with the optimal policy.
Abstract: We tackle the fundamental problem of Bayesian active learning with noise, where we need to adaptively select from a number of expensive tests in order to identify an unknown hypothesis sampled from a known prior distribution. In the case of noise-free observations, a greedy algorithm called generalized binary search (GBS) is known to perform near-optimally. We show that if the observations are noisy, perhaps surprisingly, GBS can perform very poorly. We develop EC2, a novel, greedy active learning algorithm and prove that it is competitive with the optimal policy, thus obtaining the first competitiveness guarantees for Bayesian active learning with noisy observations. Our bounds rely on a recently discovered diminishing returns property called adaptive submodularity, generalizing the classical notion of submodular set functions to adaptive policies. Our results hold even if the tests have non-uniform cost and their noise is correlated. We also propose EFFECX-TIVE, a particularly fast approximation of EC2, and evaluate it on a Bayesian experimental design problem involving human subjects, intended to tease apart competing economic theories of how people make decisions under uncertainty.

Journal ArticleDOI
TL;DR: A hierarchical controller is proposed that is able to quickly learn good grasps of a novel object in an unstructured environment, by executing smooth reaching motions and preshaping the hand depending on the object's geometry.

Proceedings Article
06 Dec 2010
TL;DR: This work presents and analyze an agnostic active learning algorithm that works without keeping a version space, unlike all previous approaches where a restricted set of candidate hypotheses is maintained throughout learning, and only hypotheses from this set are ever returned.
Abstract: We present and analyze an agnostic active learning algorithm that works without keeping a version space. This is unlike all previous approaches where a restricted set of candidate hypotheses is maintained throughout learning, and only hypotheses from this set are ever returned. By avoiding this version space approach, our algorithm sheds the computational burden and brittleness associated with maintaining version spaces, yet still allows for substantial improvements over supervised learning for classification.

Proceedings Article
01 Jan 2010
TL;DR: This paper proposes Active Crowd Translation (ACT), a new paradigm where active learning and crowd-sourcing come together to enable automatic translation for low-resource language pairs and sees significant improvements in translation quality.
Abstract: Large scale parallel data generation for new language pairs requires intensive human effort and availability of experts. It becomes immensely difficult and costly to provide Statistical Machine Translation (SMT) systems for most languages due to the paucity of expert translators to provide parallel data. Even if experts are present, it appears infeasible due to the impending costs. In this paper we propose Active Crowd Translation (ACT), a new paradigm where active learning and crowd-sourcing come together to enable automatic translation for low-resource language pairs. Active learning aims at reducing cost of label acquisition by prioritizing the most informative data for annotation, while crowd-sourcing reduces cost by using the power of the crowds to make do for the lack of expensive language experts. We experiment and compare our active learning strategies with strong baselines and see significant improvements in translation quality. Similarly, our experiments with crowd-sourcing on Mechanical Turk have shown that it is possible to create parallel corpora using non-experts and with sufficient quality assurance, a translation system that is trained using this corpus approaches expert quality.

Proceedings Article
11 Jul 2010
TL;DR: Bagel is presented, a statistical language generator which uses dynamic Bayesian networks to learn from semantically-aligned data produced by 42 untrained annotators, and can generate natural and informative utterances from unseen inputs in the information presentation domain.
Abstract: Most previous work on trainable language generation has focused on two paradigms: (a) using a statistical model to rank a set of generated utterances, or (b) using statistics to inform the generation decision process. Both approaches rely on the existence of a handcrafted generator, which limits their scalability to new domains. This paper presents Bagel, a statistical language generator which uses dynamic Bayesian networks to learn from semantically-aligned data produced by 42 untrained annotators. A human evaluation shows that Bagel can generate natural and informative utterances from unseen inputs in the information presentation domain. Additionally, generation performance on sparse datasets is improved significantly by using certainty-based active learning, yielding ratings close to the human gold standard with a fraction of the data.

Proceedings ArticleDOI
04 Feb 2010
TL;DR: By proposing optimization strategies that allow short-circuiting score computations in additive learning systems, this paper is able to speedup the score computation process by more than four times with almost no loss in result quality.
Abstract: Some commercial web search engines rely on sophisticated machine learning systems for ranking web documents. Due to very large collection sizes and tight constraints on query response times, online efficiency of these learning systems forms a bottleneck. An important problem in such systems is to speedup the ranking process without sacrificing much from the quality of results. In this paper, we propose optimization strategies that allow short-circuiting score computations in additive learning systems. The strategies are evaluated over a state-of-the-art machine learning system and a large, real-life query log, obtained from Yahoo!. By the proposed strategies, we are able to speedup the score computations by more than four times with almost no loss in result quality.

Journal ArticleDOI
TL;DR: The navigation support problem for context-aware ubiquitous learning is formulated and two navigation support algorithms are proposed by taking learning efficacy and navigation efficiency into consideration and it is concluded that the innovative approach is helpful to the students to more effectively and efficiently utilize the learning resources and achieve better learning efficacy.
Abstract: In context-aware ubiquitous learning, students are guided to learn in the real world with personalized supports from the learning system. As the learning resources are realistic objects in the real world, certain physical constraints, such as the limitation of stream of people who visit the same learning object, the time for moving from one object to another, and the environmental parameters, need to be taken into account. Moreover, the values of these context-dependent parameters are likely to change swiftly during the learning process, which makes it a challenging and important issue to find a navigation support mechanism for suggesting learning paths for individual students in real time. In this paper, the navigation support problem for context-aware ubiquitous learning is formulated and two navigation support algorithms are proposed by taking learning efficacy and navigation efficiency into consideration. From the simulation results of learning in a butterfly museum setting, it is concluded that the innovative approach is helpful to the students to more effectively and efficiently utilize the learning resources and achieve better learning efficacy.

Proceedings Article
Yuhong Guo1
06 Dec 2010
TL;DR: A novel batch-mode active learning approach that selects a batch of queries in each iteration by maximizing a natural mutual information criterion between the labeled and unlabeled instances by employing a Gaussian process framework.
Abstract: Recently, batch-mode active learning has attracted a lot of attention. In this paper, we propose a novel batch-mode active learning approach that selects a batch of queries in each iteration by maximizing a natural mutual information criterion between the labeled and unlabeled instances. By employing a Gaussian process framework, this mutual information based instance selection problem can be formulated as a matrix partition problem. Although matrix partition is an NP-hard combinatorial optimization problem, we show that a good local solution can be obtained by exploiting an effective local optimization technique on a relaxed continuous optimization problem. The proposed active learning approach is independent of employed classification models. Our empirical studies show this approach can achieve comparable or superior performance to discriminative batch-mode active learning methods.

Proceedings ArticleDOI
25 Jul 2010
TL;DR: This work proposes a novel active learning strategy that exploits a priori domain knowledge provided by the expert (specifically, labeled features) and extends this model via a Linear Programming algorithm for situations where the expert can provide ranked labeled features.
Abstract: Active learning (AL) is an increasingly popular strategy for mitigating the amount of labeled data required to train classifiers, thereby reducing annotator effort. We describe a real-world, deployed application of AL to the problem of biomedical citation screening for systematic reviews at the Tufts Medical Center's Evidence-based Practice Center. We propose a novel active learning strategy that exploits a priori domain knowledge provided by the expert (specifically, labeled features)and extend this model via a Linear Programming algorithm for situations where the expert can provide ranked labeled features. Our methods outperform existing AL strategies on three real-world systematic review datasets. We argue that evaluation must be specific to the scenario under consideration. To this end, we propose a new evaluation framework for finite-pool scenarios, wherein the primary aim is to label a fixed set of examples rather than to simply induce a good predictive model. We use a method from medical decision theory for eliciting the relative costs of false positives and false negatives from the domain expert, constructing a utility measure of classification performance that integrates the expert preferences. Our findings suggest that the expert can, and should, provide more information than instance labels alone. In addition to achieving strong empirical results on the citation screening problem, this work outlines many important steps for moving away from simulated active learning and toward deploying AL for real-world applications.

Posted Content
TL;DR: In this paper, an agnostic active learning algorithm that works without keeping a version space is presented and analyzed, unlike all previous approaches where a restricted set of candidate hypotheses is maintained throughout learning, and only hypotheses from this set are ever returned.
Abstract: We present and analyze an agnostic active learning algorithm that works without keeping a version space. This is unlike all previous approaches where a restricted set of candidate hypotheses is maintained throughout learning, and only hypotheses from this set are ever returned. By avoiding this version space approach, our algorithm sheds the computational burden and brittleness associated with maintaining version spaces, yet still allows for substantial improvements over supervised learning for classification.

Book ChapterDOI
31 Aug 2010
TL;DR: It is shown that, by exploiting links between three widely used modeling frameworks for reactive systems, any tool for active learning of Mealy machines can be used for learning I/O automata that are deterministic and output determined.
Abstract: Links are established between three widely used modeling frameworks for reactive systems: the ioco theory of Tretmans, the interface automata of De Alfaro and Henzinger, and Mealy machines. It is shown that, by exploiting these links, any tool for active learning of Mealy machines can be used for learning I/O automata that are deterministic and output determined. The main idea is to place a transducer in between the I/O automata teacher and the Mealy machine learner, which translates concepts from the world of I/O automata to the world of Mealy machines, and vice versa. The transducer comes equipped with an interface automaton that allows us to focus the learning process on those parts of the behavior that can effectively be tested and/or are of particular interest. The approach has been implemented on top of the LearnLib tool and has been applied successfully to three case studies.

Proceedings ArticleDOI
25 Jul 2010
TL;DR: A novel algorithm for multi-task learning with boosted decision trees that learns several different learning tasks with a joint model, explicitly addressing the specifics of each learning task with task-specific parameters and the commonalities between them through shared parameters.
Abstract: In this paper we propose a novel algorithm for multi-task learning with boosted decision trees. We learn several different learning tasks with a joint model, explicitly addressing the specifics of each learning task with task-specific parameters and the commonalities between them through shared parameters. This enables implicit data sharing and regularization. We evaluate our learning method on web-search ranking data sets from several countries. Here, multitask learning is particularly helpful as data sets from different countries vary largely in size because of the cost of editorial judgments. Our experiments validate that learning various tasks jointly can lead to significant improvements in performance with surprising reliability.

Proceedings ArticleDOI
03 Oct 2010
TL;DR: Gestalt allows developers to implement a classification pipeline, analyze data as it moves through that pipeline, and easily transition between implementation and analysis, and significantly improves the ability of developers to find and fix bugs in machine learning systems.
Abstract: We present Gestalt, a development environment designed to support the process of applying machine learning. While traditional programming environments focus on source code, we explicitly support both code and data. Gestalt allows developers to implement a classification pipeline, analyze data as it moves through that pipeline, and easily transition between implementation and analysis. An experiment shows this significantly improves the ability of developers to find and fix bugs in machine learning systems. Our discussion of Gestalt and our experimental observations provide new insight into general-purpose support for the machine learning process.

Proceedings ArticleDOI
04 Aug 2010
TL;DR: A system for tracking economic sentiment in online media that has been deployed since August 2009 is described, which uses annotations provided by a cohort of non-expert annotators to train a learning system to classify a large body of news items.
Abstract: Tracking sentiment in the popular media has long been of interest to media analysts and pundits. With the availability of news content via online syndicated feeds, it is now possible to automate some aspects of this process. There is also great potential to crowdsource Crowdsourcing is a term, sometimes associated with Web 2.0 technologies, that describes outsourcing of tasks to a large often anonymous community. much of the annotation work that is required to train a machine learning system to perform sentiment scoring. We describe such a system for tracking economic sentiment in online media that has been deployed since August 2009. It uses annotations provided by a cohort of non-expert annotators to train a learning system to classify a large body of news items. We report on the design challenges addressed in managing the effort of the annotators and in making annotation an interesting experience.