scispace - formally typeset
Search or ask a question

Showing papers on "Active learning (machine learning) published in 1998"


Proceedings ArticleDOI
01 Nov 1998
TL;DR: A comparison of the effectiveness of five different automatic learning algorithms for text categorization in terms of learning speed, realtime classification speed, and classification accuracy is compared.
Abstract: 1. ABSTRACT Text categorization – the assignment of natural language texts to one or more predefined categories based on their content – is an important component in many information organization and management tasks. We compare the effectiveness of five different automatic learning algorithms for text categorization in terms of learning speed, realtime classification speed, and classification accuracy. We also examine training set size, and alternative document representations. Very accurate text classifiers can be learned automatically from training examples. Linear Support Vector Machines (SVMs) are particularly promising because they are very accurate, quick to train, and quick to evaluate. 1.1

1,606 citations


Proceedings Article
24 Jul 1998
TL;DR: This work proposes a representation for collaborative filtering tasks that allows the application of virtually any machine learning algorithm, and identifies the shortcomings of current collaborative filtering techniques and proposes the use of learning algorithms paired with feature extraction techniques that specifically address the limitations of previous approaches.
Abstract: Predicting items a user would like on the basis of other users’ ratings for these items has become a well-established strategy adopted by many recommendation services on the Internet. Although this can be seen as a classification problem, algorithms proposed thus far do not draw on results from the machine learning literature. We propose a representation for collaborative filtering tasks that allows the application of virtually any machine learning algorithm. We identify the shortcomings of current collaborative filtering techniques and propose the use of learning algorithms paired with feature extraction techniques that specifically address the limitations of previous approaches. Our best-performing algorithm is based on the singular value decomposition of an initial matrix of user ratings, exploiting latent structure that essentially eliminates the need for users to rate common items in order to become predictors for one another's preferences. We evaluate the proposed algorithm on a large database of user ratings for motion pictures and find that our approach significantly outperforms current collaborative filtering algorithms.

1,169 citations


Book
01 Jan 1998
TL;DR: This book attempts to give an overview of the different recent efforts to deal with covariate shift, a challenging situation where the joint distribution of inputs and outputs differs between the training and test stages.
Abstract: All rights reserved. No part of this book may be reproduced in any form by any electronic or mechanical means (including photocopying, recording, or information storage and retrieval) without permission in writing from this is the last candidate. next esc will revert to uncompleted text. he publisher. Overview Dataset shift is a challenging situation where the joint distribution of inputs and outputs differs between the training and test stages. Covariate shift is a simpler particular case of dataset shift where only the input distribution changes (covariate denotes input), while the conditional distribution of the outputs given the inputs p(y|x) remains unchanged. Dataset shift is present in most practical applications for reasons ranging from the bias introduced by experimental design, to the mere irreproducibility of the testing conditions at training time. For example, in an image classification task, training data might have been recorded under controlled laboratory conditions, whereas the test data may show different lighting conditions. In other applications, the process that generates data is in itself adaptive. Some of our authors consider the problem of spam email filtering: successful " spammers " will try to build spam in a form that differs from the spam the automatic filter has been built on. Dataset shift seems to have raised relatively little interest in the machine learning community until very recently. Indeed, many machine learning algorithms are based on the assumption that the training data is drawn from exactly the same distribution as the test data on which the model will later be evaluated. Semi-supervised learning and active learning, two problems that seem very similar to covariate shift have received much more attention. How do they differ from covariate shift? Semi-supervised learning is designed to take advantage of unlabeled data present at training time, but is not conceived to be robust against changes in the input distribution. In fact, one can easily construct examples of covariate shift for which common SSL strategies such as the " cluster assumption " will lead to disaster. In active learning the algorithm is asked to select from the available unlabeled inputs those for which obtaining the label will be most beneficial for learning. This is very relevant in contexts where labeling data is very costly, but active learning strategies 2 Contents are not specifically design for dealing with covariate shift. This book attempts to give an overview of the different recent efforts that are being …

1,037 citations


Proceedings Article
24 Jul 1998
TL;DR: This paper shows how a text classifier’s need for labeled training documents can be reduced by taking advantage of a large pool of unlabeled documents by modifying the Query-by-Committee method of active learning to use it for explicitly estimating document density when selecting examples for labeling.
Abstract: This paper shows how a text classifier’s need for labeled training documents can be reduced by taking advantage of a large pool of unlabeled documents. We modify the Query-by-Committee (QBC) method of active learning to use the unlabeled pool for explicitly estimating document density when selecting examples for labeling. Then active learning is combined with ExpectationMaximization in order to “fill in” the class labels of those documents that remain unlabeled. Experimental results show that the improvements to active learning require less than two-thirds as many labeled training examples as previous QBC approaches, and that the combination of EM and active learning requires only slightly more than half as many labeled training examples to achieve the same accuracy as either the improved active learning or EM alone.

864 citations


Book
26 Jun 1998
TL;DR: Probabilistic inference in graphical models pattern classification unsupervised learning data compression channel coding future research directions and how this affects research directions is investigated.
Abstract: Probabilistic inference in graphical models pattern classification unsupervised learning data compression channel coding future research directions.

597 citations


BookDOI
01 Jan 1998

480 citations


BookDOI
01 Jan 1998

397 citations


Book
01 Jul 1998
TL;DR: This practical guide provides a straightforward introduction to basic machine learning and data mining methods, covering the analysis of numerical, text, and sound data.
Abstract: From the Publisher: Master the new computational tools to get the most out of your information system. This practical guide, the first to clearly outline the situation for the benefit of engineers and scientists, provides a straightforward introduction to basic machine learning and data mining methods, covering the analysis of numerical, text, and sound data.

367 citations


Proceedings Article
01 Jul 1998
TL;DR: This work shows how information extraction can be cast as a standard machine learning problem, and argues for the suitability of relational learning in solving it, and the implementation of a general-purpose relational learner for information extraction, SRV.
Abstract: Because the World Wide Web consists primarily of text, information extraction is central to any effort that would use the Web as a resource for knowledge discovery. We show how information extraction can be cast as a standard machine learning problem, and argue for the suitability of relational learning in solving it. The implementation of a general-purpose relational learner for information extraction, SRV, is described. In contrast with earlier learning systems for information extraction, SRV makes no assumptions about document structure and the kinds of information available for use in learning extraction patterns. Instead, structural and other information is supplied as input in the form of an extensible token-oriented feature set. We demonstrate the effectiveness of this approach by adapting SRV for use in learning extraction rules for a domain consisting of university course and research project pages sampled from the Web. Making SRV Web-ready only involves adding several simple HTML-specific features to its basic feature set.

294 citations


Proceedings Article
24 Jul 1998
TL;DR: This paper proposes an adaptation of the Adatron algorithm for clas-siication with kernels in high dimensional spaces that can find a solution very rapidly with an exponentially fast rate of convergence towards the optimal solution.
Abstract: Support Vector Machines work by mapping training data for classiication tasks into a high dimensional feature space. In the feature space they then nd a maximal margin hyperplane which separates the data. This hyperplane is usually found using a quadratic programming routine which is computation-ally intensive, and is non trivial to implement. In this paper we propose an adaptation of the Adatron algorithm for clas-siication with kernels in high dimensional spaces. The algorithm is simple and can nd a solution very rapidly with an exponentially fast rate of convergence (in the number of iterations) towards the optimal solution. Experimental results with real and artiicial datasets are provided.

290 citations


Proceedings ArticleDOI
16 Aug 1998
TL;DR: This paper presents the general strategy for designing learning machines as well as a number of particular designs based on two main principles: simple adaptive local models; and adaptive model distribution.
Abstract: This paper presents our general strategy for designing learning machines as well as a number of particular designs. The search for methods allowing a sufficient level of adaptivity are based on two main principles: 1) simple adaptive local models; and 2) adaptive model distribution. Particularly important concepts in our work is mutual information and canonical correlation. Examples are given on learning feature descriptors, modeling disparity, synthesis of a global 3-mode model and a setup for reinforcement learning of online video coder parameter control.

Proceedings Article
01 Dec 1998
TL;DR: A principled Bayesian model is proposed based on the assumption that the examples are a random sample from the concept to be learned, which gives precise fits to human behavior on this simple task and provides qualitative insights into more complex, realistic cases of concept learning.
Abstract: I consider the problem of learning concepts from small numbers of positive examples, a feat which humans perform routinely but which computers are rarely capable of. Bridging machine learning and cognitive science perspectives, I present both theoretical analysis and an empirical study with human subjects for the simple task oflearning concepts corresponding to axis-aligned rectangles in a multidimensional feature space. Existing learning models, when applied to this task, cannot explain how subjects generalize from only a few examples of the concept. I propose a principled Bayesian model based on the assumption that the examples are a random sample from the concept to be learned. The model gives precise fits to human behavior on this simple task and provides qualitative insights into more complex, realistic cases of concept learning.

Book ChapterDOI
22 Jul 1998
TL;DR: An introduction to reinforcement learning and relational reinforcement learning at a level to be understood by students and researchers with different backgrounds is presented.
Abstract: This paper presents an introduction to reinforcement learning and relational reinforcement learning at a level to be understood by students and researchers with different backgrounds.It gives an overview of the fundamental principles and techniques of reinforcement learning without involving a rigorous deduction of the mathematics involved through the use of an example application.Then, relational reinforcement learning is presented as a combination of reinforcement learning with relational learning. Its advantages -- such as the possibility of using structural representations, making abstraction from specific goals pursued and exploiting the results of previous learning phases -- are discussed.

Book ChapterDOI
08 Oct 1998
TL;DR: It is shown that k-DNF and k-decision lists are learnable in both models, i.e. with far less information than it is assumed in previously used algorithms.
Abstract: Learning from positive examples occurs very frequently in natural learning. The PAC learning model of Valiant takes many features of natural learning into account, but in most cases it fails to describe such kind of learning. We show that in order to make the learning from positive data possible, extra-information about the underlying distribution must be provided to the learner. We define a PAC learning model from positive and unlabeled examples. We also define a PAC learning model from positive and unlabeled statistical queries. Relations with PAC model ([Val84]), statistical query model ([Kea93]) and constantpartition classification noise model ([Dec97]) are studied. We show that k-DNF and k-decision lists are learnable in both models, i.e. with far less information than it is assumed in previously used algorithms.

Journal ArticleDOI
TL;DR: The use of communication is used to share sensory data to overcome hidden state and share reinforcement to overcome the credit assignment problem between the agents and bridge the gap between local individual and global group pay-off.
Abstract: . This paper attempts to bridge the fields of machine learning, robotics, and distributed AI. It discusses the use of communication in reducing the undesirable effects of locality in fully distributed multi-agent systems with multiple agents robots learning in parallel while interacting with each other. Two key problems, hidden state and credit assignment, are addressed by applying local undirected broadcast communication in a dual role: as sensing and as reinforcement. The methodology is demonstrated on two multi-robot learning experiments. The first describes learning a tightly-coupled coordination task with two robots, the second a loosely-coupled task with four robots learning social rules. Communication is used to (1) share sensory data to overcome hidden state and (2) share reinforcement to overcome the credit assignment problem between the agents and bridge the gap between local individual and global group pay-off.

Proceedings Article
01 Jan 1998
TL;DR: SVM adapts eeciently in dynamic environments that require frequent additions to the document collection, and allows easy incorporation of new documents into an existing trained system.
Abstract: In this paper, we study the use of support vector machine in text categorization. Unlike other machine learning techniques , it allows easy incorporation of new documents into an existing trained system. Moreover, dimension reduction, which is usually imperative, now becomes optional. Thus, SVM adapts eeciently in dynamic environments that require frequent additions to the document collection. Empirical results on the Reuters-22173 collection are also discussed.

Proceedings Article
01 Jul 1998
TL;DR: This paper used machine learning on a training corpus of documents and their abstracts to discover salience functions which describe what combination of features is optimal for a given summarization task, which addresses both "generic" and user-focused summaries.
Abstract: A key problem in text summarization is finding a salience function which determines what information in the source should be included in the summary. This paper describes the use of machine learning on a training corpus of documents and their abstracts to discover salience functions which describe what combination of features is optimal for a given summarization task. The method addresses both "generic" and user-focused summaries.

01 Jan 1998
TL;DR: The present invention is directed to non-hygroscopic, water-soluble sugar compositions which are prepared by grinding together in a dry, solid state, a white sugar component and a "pulverizing aid" in the form of a water- soluble maltodextrin having a measurable dextrose equivalent value not substantially above 20.
Abstract: The present invention is directed to non-hygroscopic, water-soluble sugar compositions which are prepared by grinding together in a dry, solid state, a white sugar component and a "pulverizing aid" in the form of a water-soluble maltodextrin having a measurable dextrose equivalent value not substantially above 20, said "pulverizing aid" being employed in amounts ranging from about 5 to about 20% by weight of said total composition, the resulting product having an average particle size such that 95% by weight of the composition passes through a 325 mesh, said composition being further characterized as having a ratio of weight average particle size to number average particle size of less than 2. The compositions are free-flowing powders useful in preparing icings, buttercreams and fudges.

Journal ArticleDOI
TL;DR: It is shown that the learning algorithm not only solves the convergence and robustness problems but also improves the learning rate for discrete-time nonlinear time-varying systems.
Abstract: A discrete iterative learning control is presented for a class of discrete-time nonlinear time-varying systems with initial state error, input disturbance, and output measurement noise. A feedforward learning algorithm is designed under a stabilizing controller and is updated by more than one past control data in the previous trials. A systematic approach is developed to analyze the convergence and robustness of the proposed learning scheme. It is shown that the learning algorithm not only solves the convergence and robustness problems but also improves the learning rate for discrete-time nonlinear time-varying systems.

Journal ArticleDOI
TL;DR: The results demonstrate thatlearning in the loss domain can be faster than learning in the gain domain; adding a constant to the payoff matrix can affect the learning process.

Book
01 Oct 1998
TL;DR: In the passive learning paradigm, a learner learns purely through observing its environment, and common learning tasks are the clustering, classi cation, or prediction of future data.
Abstract: In the passive learning paradigm, a learner learns purely through observing its environment. The environment is assumed to generate a stream of training data according to some unknown probability distribution. Passive learning techniques di er in the type of results they seek to produce, as well as in the way they generalize from observations. Common learning tasks are the clustering, classi cation, or prediction of future data.

01 Jan 1998
TL;DR: Experimental results show that the number of examples required to achieve a given level of performance can be significantly reduced by this method, demonstrating the superiority of relational learning for some information extraction tasks.
Abstract: The recent growth of online information available in the form of natural language documents creates a greater need for computing systems with the ability to process those documents to simplify access to the information. One type of processing appropriate for many tasks is information extraction, a type of text skimming that retrieves specific types of information from text. Although information extraction systems have existed for two decades, these systems have generally been built by hand and contain domain specific information, making them difficult to port to other domains. A few researchers have begun to apply machine learning to information extraction tasks, but most of this work has involved applying learning to pieces of a much larger system. This dissertation presents a novel rule representation specific to natural language and a relational learning system, R scAPIER, which learns information extraction rules. R scAPIER takes pairs of documents and filled templates indicating the information to be extracted and learns pattern-matching rules to extract fillers for the slots in the template. The system is tested on several domains, showing its ability to learn rules for different tasks. R scAPIER's performance is compared to a propositional learning system for information extraction, demonstrating the superiority of relational learning for some information extraction tasks. Because one difficulty in using machine learning to develop natural language processing systems is the necessity of providing annotated examples to supervised learning systems, this dissertation also describes an attempt to reduce the number of examples R scAPIER requires by employing a form of active learning. Experimental results show that the number of examples required to achieve a given level of performance can be significantly reduced by this method.

Journal ArticleDOI
TL;DR: In this paper, a 2D continuous-discrete Roesser's linear model is used to describe both the dynamics of the control system and the behavior of the learning process.
Abstract: This work presents a two-dimensional (2-D) system theory based iterative learning control (ILC) method for linear continuous-time multivariable systems. We demonstrate that a 2-D continuous-discrete model can be successfully applied to describe both the dynamics of the control system and the behavior of the learning process. We successfully exploited the 2-D continuous-discrete Roesser's linear model by extending the ILC technique from discrete control systems to continuous control systems. Three learning rules for ILC are derived. Necessary and sufficient conditions are given for convergence of the proposed learning rules. Compared to the learning rule suggested by Arimoto et al. (1984), our developed learning rules are less restrictive and have wider applications. The third learning rule proposed ensures the reference output trajectory can be accurately tracked after only one learning trial. Three numerical examples are used to illustrate the proposed control procedures.

Journal ArticleDOI
TL;DR: The results suggest that in some multiagent learning scenarios direct search in policy space can offer advantages over EF-based approaches, including PIPE and CO-PIPE, which do not depend on EFs and find good policies faster and more reliably.
Abstract: We use simulated soccer to study multiagent learning. Each team‘s players (agents) share action set and policy, but may behave differently due to position-dependent inputs. All agents making up a team are rewarded or punished collectively in case of goals. We conduct simulations with varying team sizes, and compare several learning algorithms: TD-Q learning with linear neural networks (TD-Q), Probabilistic Incremental Program Evolution (PIPE), and a PIPE version that learns by coevolution (CO-PIPE). TD-Q is based on learning evaluation functions (EFs) mapping input/action pairs to expected reward. PIPE and CO-PIPE search policy space directly. They use adaptive probability distributions to synthesize programs that calculate action probabilities from current inputs. Our results show that linear TD-Q encounters several difficulties in learning appropriate shared EFs. PIPE and CO-PIPE, however, do not depend on EFs and find good policies faster and more reliably. This suggests that in some multiagent learning scenarios direct search in policy space can offer advantages over EF-based approaches.

01 Jan 1998
TL;DR: This work describes two applications that use rated text documents to induce a model of the user's interests and proposes the use of a probabilistic learning algorithm, the Simple Bayesian Classifier (SBC), for user modeling tasks.
Abstract: We describe two applications that use rated text documents to induce a model of the user's interests. Based on our experiments with these applications we propose the use of a probabilistic learning algorithm, the Simple Bayesian Classifier (SBC), for user modeling tasks. We discuss the advantages and disadvantages of the SBC and present a novel extension to this algorithm that is specifically geared towards improving predictive accuracy for datasets typically encountered in user modeling and information filtering tasks. Results from an empirical study demonstrate the effectiveness of our approach.

Journal ArticleDOI
TL;DR: It is suggested that control and learning in multi-robot systems must be addressed as a separate, novel and unified problem-not an additional "module" in a single-ro robot approach and a bottom-up methodology to produce the desired system behavior is proposed.
Abstract: Finding methods for generating coherent, robust, useful and adaptive behavior in groups of autonomous robots is an increasingly active area of research. The incremental approach to robotics-first studying control and learning in a single robot-is not sufficient or even relevant (for some problems) to the multi-robot coordination and learning problem. Instead, the problem requires a general approach-which is fundamentally different from most of today's robot control and learning methods-to make the necessary strides in this challenging domain. Based on our research of situated, embodied systems, we suggest that control and learning in multi-robot systems must be addressed as a separate, novel and unified problem-not an additional "module" in a single-robot approach. We propose a bottom-up methodology to produce the desired system behavior.

Proceedings ArticleDOI
27 May 1998
TL;DR: This paper explores and presents the basic concepts of machine learning, and how some concepts match nicely with multi-valued logic synthesis, while others pose great difficulties.
Abstract: In the past few years, several authors have presented methods of using functional decomposition as applied to machine learning. These authors explore the ideas of functional decomposition, but left the concepts of machine learning to the papers that they reference. In general, they never fully explain why a logic synthesis method should be applied to machine learning. This paper explores and presents the basic concepts of machine learning, and how some concepts match nicely with multi-valued logic synthesis, while others pose great difficulties. The main reason for using multi-valued synthesis is that many problems are naturally multi-valued (i.e., values taken from a discrete set). Thus, mapping the problem directly to a multi-valued set of inputs and outputs is much more natural than encoding the problem into a binary form. The paper also shows that any multi-valued logic synthesis method could be applied to the machine learning problem. But, this paper focuses on multivalued functional decomposition because of its generality of minimizing a given data set.


Journal ArticleDOI
TL;DR: This work proposes a novel query algorithm for local learning models, a class of learners that has not been considered in the context of active learning until now, based on the idea of selecting a query on the borderline of the actual classification.
Abstract: In this contribution, we deal with active learning, which gives the learner the power to select training samples. We propose a novel query algorithm for local learning models, a class of learners that has not been considered in the context of active learning until now. Our query algorithm is based on the idea of selecting a query on the borderline of the actual classification. This is done by drawing on the geometrical properties of local models that typically induce a Voronoi tessellation on the input space, so that the Voronoi vertices of this tessellation offer themselves as prospective query points. The performance of the new query algorithm is tested on the two-spirals problem with promising results.

Journal ArticleDOI
TL;DR: A novel hybrid symbolic-connectionist approach to machine learning is introduced and applied to fault diagnosis of a hydrocarbon chlorination plant, indicating that the introduced system is a promising alternative to neural networks for fault diagnosis and a complement to expert systems.