scispace - formally typeset
Search or ask a question

Showing papers on "Active learning (machine learning) published in 1995"


Journal ArticleDOI
TL;DR: High generalization ability of support-vector networks utilizing polynomial input transformations is demonstrated and the performance of the support- vector network is compared to various classical learning algorithms that all took part in a benchmark study of Optical Character Recognition.
Abstract: The support-vector network is a new learning machine for two-group classification problems. The machine conceptually implements the following idea: input vectors are non-linearly mapped to a very high-dimension feature space. In this feature space a linear decision surface is constructed. Special properties of the decision surface ensures high generalization ability of the learning machine. The idea behind the support-vector network was previously implemented for the restricted case where the training data can be separated without errors. We here extend this result to non-separable training data. High generalization ability of support-vector networks utilizing polynomial input transformations is demonstrated. We also compare the performance of the support-vector network to various classical learning algorithms that all took part in a benchmark study of Optical Character Recognition.

37,861 citations


Journal ArticleDOI
Yoav Freund1
TL;DR: An algorithm for improving the accuracy of algorithms for learning binary concepts by combining a large number of hypotheses, each of which is generated by training the given learning algorithm on a different set of examples, is presented.
Abstract: We present an algorithm for improving the accuracy of algorithms for learning binary concepts. The improvement is achieved by combining a large number of hypotheses, each of which is generated by training the given learning algorithm on a different set of examples. Our algorithm is based on ideas presented by Schapire and represents an improvement over his results, The analysis of our algorithm provides general upper bounds on the resources required for learning in Valiant′s polynomial PAC learning framework, which are the best general upper bounds known today. We show that the number of hypotheses that are combined by our algorithm is the smallest number possible. Other outcomes of our analysis are results regarding the representational power of threshold circuits, the relation between learnability and compression, and a method for parallelizing PAC learning algorithms. We provide extensions of our algorithms to cases in which the concepts are not binary and to the case where the accuracy of the learning algorithm depends on the distribution of the instances.

1,632 citations


Book
15 Sep 1995
TL;DR: Elements of Machine Learning by Pat Langley examines the science of machine learning, methodology, and prospects for machine learning in the coming years.
Abstract: Elements of Machine Learning by Pat Langley Preface 1. An overview of machine learning 1.1 The science of machine learning 1.2 Nature of the environment 1.3 Nature of representation and performance 1.4 Nature of the learning component 1.5 Five paradigms for machine learning 1.6 Summary of the chapter 2. The induction of logical conjunctions 2.1 General issues in logical induction 2.2 Nonincremental induction of logical conjunctions 2.3 Heuristic induction of logical conjunctions 2.4 Incremental induction of logical conjunctions 2.5 Incremental hill climbing for logical conjunctions 2.6 Genetic algorithms for logical concept induction 2.7 Summary of the chapter 3. The induction of threshold concepts 3.1 General issues for threshold concepts 3.2 Induction of criteria tables 3.3 Induction of linear threshold units 3.4 Induction of spherical threshold units 3.5 Summary of the chapter 4. The induction of competitive concepts 4.1 Instance-based learning 4.2 Learning probabilistic concept descriptions 4.3 Summary of the chapter 5. The construction of decision lists 5.1 General issues in disjunctive concept induction 5.2 Nonincremental learning using separate and conquer 5.3 Incremental induction using separate and conquer 5.4 Induction of decision lists through exceptions 5.5 Induction of competitive disjunctions 5.6 Instance-storing algorithms 5.7 Complementary beam search for disjunctive concepts 5.8 Summary of the chapter 6. Revision and extension of inference networks 6.1 General issues surrounding inference network 6.2 Extending an incomplete inference network 6.3 Inducing specialized concepts with inference networks 6.4 Revising an incorrect inference network 6.5 Network construction and term generation 6.6 Summary of the chapter 7. The formation of concept hierarchies 7.1 General issues concerning concept hierarchies 7.2 Nonincremental divisive formation of hierarchies 7.3 Incremental formation of concept hierarchies 7.4 Agglomerative formation of concept hierarchies 7.5 Variations on hierarchies into other structures 7.7 Summary of the chapter 8. Other issues in concept induction 8.1 Overfitting and pruning 8.2 Selecting useful features 8.3 Induction for numeric prediction 8.4 Unsupervised concept induction 8.5 Inducing relational concepts 8.6 Handling missing features 8.7 Summary of the chapter 9. The formation of transition networks 9.1 General issues for state-transition networks 9.2 Constructing finite-state transition networks 9.3 Forming recursive transition networks 9.4 Learning rules and networks for prediction 9.5 Summary of the chapter 10. The acquisition of search-control knowledge 10.1 General issues in search control 10.2 Reinforcement learning 10.3 Learning state-space heuristics from solution traces 10.4 Learning control knowledge for problem reduction 10.5 Learning control knowledge for means-ends analysis 10.6 The utility of search-control knowledge 10.7 Summary of the chapter 11. The formation of macro-operators 11.1 General issues related to macro-operators 11.2 The creation of simple macro-operators 11.3 The formation of flexible macro-operators 11.4 Problem solving by analogy 11.5 The utility of macro-operators 11.6 Summary of the chapter 12. Prospects for machine learning 12.1 Additional areas of machine learning 12.2 Methodological trends in machine learning 12.3 The future of machine learning References Index

538 citations


Proceedings Article
27 Nov 1995
TL;DR: It is shown that across the board, lifelong learning approaches generalize consistently more accurately from less training data, by their ability to transfer knowledge across learning tasks.
Abstract: This paper investigates learning in a lifelong context. Lifelong learning addresses situations in which a learner faces a whole stream of learning tasks. Such scenarios provide the opportunity to transfer knowledge across multiple learning tasks, in order to generalize more accurately from less training data. In this paper, several different approaches to lifelong learning are described, and applied in an object recognition domain. It is shown that across the board, lifelong learning approaches generalize consistently more accurately from less training data, by their ability to transfer knowledge across learning tasks.

474 citations


Journal ArticleDOI
TL;DR: A reinforcement learning algorithm is proposed, which can construct a neural fuzzy control network automatically and dynamically through a reward-penalty signal, which combines a proposed on-line supervised structure-parameter learning technique, the temporal difference prediction method, and the stochastic exploratory algorithm.

327 citations


Journal ArticleDOI
TL;DR: The method induces solutions from samples in the form of ordered disjunctive normal form (DNF) decision rules, which can be extended to search efficiently for similar cases prior to approximating function values.
Abstract: We describe a machine learning method for predicting the value of a real-valued function, given the values of multiple input variables. The method induces solutions from samples in the form of ordered disjunctive normal form (DNF) decision rules. A central objective of the method and representation is the induction of compact, easily interpretable solutions. This rule-based decision model can be extended to search efficiently for similar cases prior to approximating function values. Experimental results on real-world data demonstrate that the new techniques are competitive with existing machine learning and statistical methods and can sometimes yield superior regression performance.

181 citations


Book ChapterDOI
09 Jul 1995
TL;DR: This paper evaluates different techniques for learning from partitioned data and the meta-learning approach is empirically compared with techniques in the literature that aim to combine multiple evidence to arrive at one prediction.
Abstract: Much of the research in inductive learning concentrates on problems with relatively small amounts of data. With the coming age of very large network computing, it is likely that orders of magnitude more data in databases will be available for various learning problems of real world importance. Some learning algorithms assume that the entire data set fits into main memory, which is not feasible for massive amounts of data. One approach to handling a large data set is to partition the data set into subsets, run the learning algorithm on each of the subsets, and combine the results. In this paper we evaluate different techniques for learning from partitioned data. Our meta-learning approach is empirically compared with techniques in the literature that aim to combine multiple evidence to arrive at one prediction.

179 citations


Journal ArticleDOI
TL;DR: This paper concentrates on Doppelgänger's learning techniques and their implementation in an application-independent, sensor-independent environment.
Abstract: Doppelganger is a generalized user modeling system that gathers data about users, performs inferences upon the data, and makes the resulting information available to applications.Doppelganger's learning is calledheterogeneous for two reasons: first, multiple learning techniques are used to interpret the data, and second, the learning techniques must often grapple with disparate data types. These computations take place at geographically distributed sites, and make use of portable user models carried by individuals. This paper concentrates onDoppelganger's learning techniques and their implementation in an application-independent, sensor-independent environment.

159 citations


Journal ArticleDOI
TL;DR: It is shown that the convergence condition of the learning control in the feedback configuration does not change from the condition in an open-loop configuration, but the learning speed can be improved greatly in the Feedback configuration.

153 citations


Journal ArticleDOI
TL;DR: Attribute-based learning is limited to non-relational descriptions of objects in the sense that the learned descriptions do not specify relations among the objects' parts, and the lack of relations makes the concept description language inappropriate for some domains.
Abstract: Techniques of machine learning have been successfully applied to various problems [1, 12]. Most of these applications rely on attribute-based learning, exemplified by the induction of decision trees as in the program C4.5 [20]. Broadly speaking, attribute-based learning also includes such approaches to learning as neural networks and nearest neighbor techniques. The advantages of attribute-based learning are: relative simplicity, efficiency, and existence of effective techniques for handling noisy data. However, attribute-based learning is limited to non-relational descriptions of objects in the sense that the learned descriptions do not specify relations among the objects' parts. Attribute-based learning thus has two strong limitations: the background knowledge can be expressed in rather limited form, andthe lack of relations makes the concept description language inappropriate for some domains.

149 citations


Proceedings ArticleDOI
26 Jun 1995
TL;DR: This work predicts discourse segment boundaries from linguistic features of utterances, using a corpus of spoken narratives as data to develop segmentation algorithms from training data using hand tuning and machine learning.
Abstract: We predict discourse segment boundaries from linguistic features of utterances, using a corpus of spoken narratives as data. We present two methods for developing segmentation algorithms from training data: hand tuning and machine learning. When multiple types of features are used, results approach human performance on an independent test set (both methods), and using cross-validation (machine learning).

Book ChapterDOI
09 Jul 1995
TL;DR: The performance of this theoretically founded algorithm T2, an agnostic PAC-learning of decision trees of at most 2 levels, is evaluated on 15 common “real-world” datasets, and it is shown that for most of these datasets T2 provides simple decision trees with little or no loss in predictive power.
Abstract: We exhibit a theoretically founded algorithm T2 for agnostic PAC-learning of decision trees of at most 2 levels, whose computation time is almost linear in the size of the training set. We evaluate the performance of this learning algorithm T2 on 15 common “real-world” datasets, and show that for most of these datasets T2 provides simple decision trees with little or no loss in predictive power (compared with C4.5). In fact, for datasets with continuous attributes its error rate tends to be lower than that of C4.5. To the best of our knowledge this is the first time that a PAC-learning algorithm is shown to be applicable to “real-world” classification problems. Since one can prove that T2 is an agnostic PAC- learning algorithm, T2 is guaranteed to produce close to optimal 2-level decision trees from sufficiently large training sets for any (!) distribution of data. In this regard T2 differs strongly from all other learning algorithms that are considered in applied machine learning, for which no guarantee can be given about their performance on new datasets. We also demonstrate that this algorithm T2 can be used as a diagnostic tool for the investigation of the expressive limits of 2-level decision trees. Finally, T2, in combination with new bounds on the VC-dimension of decision trees of bounded depth that we derive, provides us now for the first time with the tools necessary for comparing learning curves of decision trees for “real-world” datasets with the theoretical estimates of PAC- learning theory.

Journal ArticleDOI
TL;DR: A learning rule of neural networks via a simultaneous perturbation and an analog feedforward neural network circuit using the learning rule, which requires only forward operations of the neural network and is suitable for hardware implementation.

Journal ArticleDOI
TL;DR: In this article, the authors investigated how learning from different media, either from real pulley systems or from simple line diagrams, affected mechanical learning and problem solving, and found that subjects who learned hands-on, by manipulating real-pulley systems, solved application problems more accurately than those who learned from diagrams.
Abstract: In this study, we investigated how learning from different media, either from real pulley systems or from simple line diagrams, affected mechanical learning and problem solving. Novice subjects learned about pulley systems by comparing the efficiency of different systems and receiving feedback on their accuracy. The main outcome measures were subjects' ability to compare pulley system efficiency, their level of mechanical reasoning, and their ability to apply knowledge of system efficiency and construction details. Experiment 1 showed that (a) subjects learning with the two types of media made equal improvement on the learning task, and (b) all subjects showed an increase in quantitative understanding as they learned, but (c) subjects who learned hands-on, by manipulating real pulley systems, solved application problems more accurately than those who learned from diagrams. Experiment 2 showed that both the realism of the stimuli and the opportunity to manipulate systems contributed to the improved perform...

Journal ArticleDOI
TL;DR: This approach joins two forms of learning, the technique of neural networks and rough sets, and aims to improve the overall classification effectiveness of learned objects' description and refine the dependency factors of the rules.

Proceedings ArticleDOI
26 Jun 1995
TL;DR: The paper points out problems with global learning methods in local model networks and illustrated that local learning has a regularizing effect that can make it favorable compared to global learning in some cases.
Abstract: Local model networks are hybrid models which allow the easy integration of a priori knowledge, as well as the ability to learn from data to represent complex, multidimensional dynamic systems from data. The paper points out problems with global learning methods in local model networks. The bias/variance trade offs for local and global learning are examined, and it is illustrated that local learning has a regularizing effect that can make it favorable compared to global learning in some cases.

Journal ArticleDOI
TL;DR: The performance of the presented Stochastic Estimator Learning Automaton (SELA) is superior to all previous well-known S- model ergodic schemes and it is proved that SELA is ϵ-optimal in every S-model random environment.

Book ChapterDOI
Matthias Rauterberg1
28 Mar 1995
TL;DR: A concept to information processing is presented that derives an inverted U-shaped function between incongruity and information: a homeostatic model of ‘in-homeostasis’.
Abstract: Information and information processing are one of the most important aspects of dynamic systems. The term ‘information’, that is used in various contexts, might better be replaced with one that incorporates novelty, activity and learning. Many important communications of learning systems are non-ergodic. The ergodicity assumption in Shannon’s communication theory restricts his and all related concepts to systems that can not learn. For learning systems that interact with their environments, the more primitive concept of ‘variety’ will have to be used, instead of probability. Humans have a fundamental need for variety: he can’t permanently perceive the same context, he can’t do always the same things. The fundamental need for variety leads to a different interpretation of human behaviour that is often classified as “errors”. Variety is the basis to measure complexity. Complexity in the relationship between a learning system and his context can be expressed as incongruity. Incongruity is the difference between internal complexity of a learning system and the complexity of the context. Traditional concepts of information processing are models of homeostasis on a basic level without learning. Activity and the irreversible learning process are driving forces that cause permanently in-homeostasis in the relationship between a learning system and his context. A suitable model for information processing of learning systems must be conceptualised on a higher level: a homeostatic model of ‘in-homeostasis’. A concept to information processing is presented that derives an inverted U-shaped function between incongruity and information. This concept leads to some design recommendations for man-machine systems.


Proceedings Article
20 Aug 1995
TL;DR: This paper compares the arbiter tree strategy to a new but related approach called the combiner tree strategy, which aims to learn how to combine a number of base classifiers so that it can scale efficiently to larger learning problems, and boost the accuracy of the constituent classifiers if possible.
Abstract: Knowledge discovery in databases has become an increasingly important research topic with the advent of wide area network computing. One of the crucial problems we study in this paper is how to scale machine learning algorithms, that typically are designed to deal with main memory based datasets, to efficiently learn from large distributed databases. We have explored an approach called meta-learning that is related to the traditional approaches of data reduction commonly employed in distributed query processing systems. Here we seek efficient means to learn how to combine a number of base classifiers, which are learned from subsets of the data, so that we scale efficiently to larger learning problems, and boost the accuracy of the constituent classifiers if possible. In this paper we compare the arbiter tree strategy to a new but related approach called the combiner tree strategy.

Journal ArticleDOI
TL;DR: In multi-agent systems two forms of learning can be distinguished: centralized learning, that is, learning done by a single agent independent of the other agents; and distributed learning, which becomes possible only because several agents are present.

Book ChapterDOI
09 Jul 1995
TL;DR: This paper shows how to develop a dynamic programming version of EBL, which is called Explanation-Based Reinforcement Learning (EBRL), and shows that EBRL combines the strengths of E BL (fast learning and the ability to scale to large state spaces) with the strength of RL* (learning of optimal policies).
Abstract: In speedup-learning problems, where full descriptions of operators are always known, both explanation-based learning (EBL) and reinforcement learning (RL) can be applied. This paper shows that both methods involve fundamentally the same process of propagating information backward from the goal toward the starting state. RL performs this propagation on a state-by-state basis, while EBL computes the weakest preconditions of operators, and hence, performs this propagation on a region-by-region basis. Based on the observation that RL is a form of asynchronous dynamic programming, this paper shows how to develop a dynamic programming version of EBL, which we call Explanation-Based Reinforcement Learning (EBRL). The paper compares batch and online versions of EBRL to batch and online versions of RL and to standard EBL. The results show that EBRL combines the strengths of EBL (fast learning and the ability to scale to large state spaces) with the strengths of RL* (learning of optimal policies). Results are shown in chess endgames and in synthetic maze tasks.

Dissertation
01 Jan 1995

Book ChapterDOI
01 Jan 1995
TL;DR: In designing learning algorithms it seems quite reasonable to construct them in such a way that all data the algorithm already has obtained are correctly and completely reflected in the hypothesis the algorithm outputs on these data, but this approach may totally fail.
Abstract: In designing learning algorithms it seems quite reasonable to construct them in such a way that all data the algorithm already has obtained are correctly and completely reflected in the hypothesis the algorithm outputs on these data. However, this approach may totally fail. It may lead to the unsolvability of the learning problem, or it may exclude any efficient solution of it.

Journal ArticleDOI
TL;DR: In this paper, the authors describe the conceptual model and architecture of the system (Collaborative Distance Learning Support System: CODILESS), which aims for effective learning and efficiency, both from the learner's and course provider's point of view.
Abstract: In order to meet the growing demand for flexible and continuing education, distance learning is increasingly being used to supplement the conventional classroom based education. The learning approaches that have trained students to work alone and independently are also being augmented with collaborative approaches that better fit the needs of today's organizations. To devise a model and develop an implementation for an effective and efficient collaborative distance learning system, the authors have started an international cooperative project. In this paper, we describe our research objectives and illustrate the key design criteria and system features by using the experiences from recent work on collaborative distance learning. We describe the conceptual model and architecture of the system (Collaborative Distance Learning Support System: CODILESS). The system aims for effective learning and efficiency, both from the learner's and course provider's point of view. CODILESS supports both collaborative and resource based learning within the same environment by integrating asynchronous and synchronous multimedia communication with electronic learning resources on the local workstation and on the Internet.

Proceedings Article
Kenji Fukumizu1
27 Nov 1995
TL;DR: This work derives the singularity condition of an information matrix, and proposes an active learning technique that is applicable to MLP, and its effectiveness is verified through experiments.
Abstract: We propose an active learning method with hidden-unit reduction. which is devised specially for multilayer perceptrons (MLP). First, we review our active learning method, and point out that many Fisher-information-based methods applied to MLP have a critical problem: the information matrix may be singular. To solve this problem, we derive the singularity condition of an information matrix, and propose an active learning technique that is applicable to MLP. Its effectiveness is verified through experiments.

Book ChapterDOI
25 Apr 1995
TL;DR: It is proved that the non-monotonic learning algorithm that realizes these ideas converges asymptotically to the concept to be learned.
Abstract: In this paper we present a framework for learning non-monotonic logic programs. The method is parametric on a classical learning algorithm whose generated rules are to be understood as default rules. This means that these rules must be tolerant to the negative information by allowing for the possibility of exceptions. The same classical algorithm is then used to learn recursively these exceptions. We prove that the non-monotonic learning algorithm that realizes these ideas converges asymptotically to the concept to be learned. We also discuss various general issues concerning the problem of learning nonmonotonic theories in the proposed framework.

Proceedings ArticleDOI
27 Nov 1995
TL;DR: In this method data gathering is reduced to a minimum, yet modelling accuracy is uncompromised, and the authors' active querying criterion is determined by whether or not several models agree when they are fitted to random subsamples of a small amount of data.
Abstract: Uses the 'query-by-committee' approach for building an active scheme for data collection In this method data gathering is reduced to a minimum, yet modelling accuracy is uncompromised The authors' active querying criterion is determined by whether or not several models agree when they are fitted to random subsamples of a small amount of collected data Experiments with neural network models to establish the feasibility of the authors' algorithm have produced encouraging results

ReportDOI
01 Nov 1995
TL;DR: This paper investigates learning in a lifelong context where a learner faces a stream of learning tasks and proposes and evaluates several approaches to lifelong learning that generalize consistently more accurately from scarce training data than comparable "single-task" approaches.
Abstract: : Machine learning has not yet succeeded in the design of robust learning algorithms that generalize well from very small datasets. In contrast, humans often generalize correctly from only a single training example, even if the number of potentially relevant features is large. To do so, they successfully exploit knowledge acquired in previous learning tasks, to bias subsequent learning. This paper investigates learning in a lifelong context. Lifelong learning addresses situations where a learner faces a stream of learning tasks. Such scenarios provide the opportunity for synergetic effects that arise if knowledge is transferred across multiple learning tasks. To study the utility of transfer, several approaches to lifelong learning are proposed and evaluated in an object recognition domain. It is shown that all these algorithms generalize consistently more accurately from scarce training data than comparable "single-task" approaches.

Proceedings Article
Lei Xu1
27 Nov 1995
TL;DR: A Bayesian-Kullback learning scheme, called Ying-Yang Machine, is proposed based on the two complement but equivalent Bayesian representations for joint density and their Kullback divergence.
Abstract: A Bayesian-Kullback learning scheme, called Ying-Yang Machine, is proposed based on the two complement but equivalent Bayesian representations for joint density and their Kullback divergence. Not only the scheme unifies existing major supervised and unsupervised learnings, including the classical maximum likelihood or least square learning, the maximum information preservation, the EM & em algorithm and information geometry, the recent popular Helmholtz machine, as well as other learning methods with new variants and new results; but also the scheme provides a number of new learning models.