Showing papers on "Active learning (machine learning) published in 1996"

PDF

Open Access

Journal Article•DOI•

[...]

01 Dec 1996-ACM Computing Surveys

TL;DR: Machine learning addresses many of the same research questions as the fields of statistics, data mining, and psychology, but with differences of emphasis.

...read moreread less

Abstract: Machine Learning is the study of methods for programming computers to learn. Computers are applied to a wide range of tasks, and for most of these it is relatively easy for programmers to design and implement the necessary software. However, there are many tasks for which this is difficult or impossible. These can be divided into four general categories. First, there are problems for which there exist no human experts. For example, in modern automated manufacturing facilities, there is a need to predict machine failures before they occur by analyzing sensor readings. Because the machines are new, there are no human experts who can be interviewed by a programmer to provide the knowledge necessary to build a computer system. A machine learning system can study recorded data and subsequent machine failures and learn prediction rules. Second, there are problems where human experts exist, but where they are unable to explain their expertise. This is the case in many perceptual tasks, such as speech recognition, hand-writing recognition, and natural language understanding. Virtually all humans exhibit expert-level abilities on these tasks, but none of them can describe the detailed steps that they follow as they perform them. Fortunately, humans can provide machines with examples of the inputs and correct outputs for these tasks, so machine learning algorithms can learn to map the inputs to the outputs. Third, there are problems where phenomena are changing rapidly. In finance, for example, people would like to predict the future behavior of the stock market, of consumer purchases, or of exchange rates. These behaviors change frequently, so that even if a programmer could construct a good predictive computer program, it would need to be rewritten frequently. A learning program can relieve the programmer of this burden by constantly modifying and tuning a set of learned prediction rules. Fourth, there are applications that need to be customized for each computer user separately. Consider, for example, a program to filter unwanted electronic mail messages. Different users will need different filters. It is unreasonable to expect each user to program his or her own rules, and it is infeasible to provide every user with a software engineer to keep the rules up-to-date. A machine learning system can learn which mail messages the user rejects and maintain the filtering rules automatically. Machine learning addresses many of the same research questions as the fields of statistics, data mining, and psychology, but with differences of emphasis. Statistics focuses on understanding the phenomena that have generated the data, often with the goal of testing different hypotheses about those phenomena. Data mining seeks to find patterns in the data that are understandable by people. Psychological studies of human learning aspire to understand the mechanisms underlying the various learning behaviors exhibited by people (concept learning, skill acquisition, strategy change, etc.).

...read moreread less

13,246 citations

Journal Article•DOI•

Active learning with statistical models

[...]

David Cohn¹, Zoubin Ghahramani¹, Michael I. Jordan¹•Institutions (1)

Massachusetts Institute of Technology¹

01 Jan 1996-Journal of Artificial Intelligence Research

TL;DR: In this article, the optimal data selection techniques have been used with feed-forward neural networks and showed how the same principles may be used to select data for two alternative, statistically-based learning architectures: mixtures of Gaussians and locally weighted regression.

...read moreread less

Abstract: For many types of machine learning algorithms, one can compute the statistically "optimal" way to select training data. In this paper, we review how optimal data selection techniques have been used with feedforward neural networks. We then show how the same principles may be used to select data for two alternative, statistically-based learning architectures: mixtures of Gaussians and locally weighted regression. While the techniques for neural networks are computationally expensive and approximate, the techniques for mixtures of Gaussians and locally weighted regression are both efficient and accurate. Empirically, we observe that the optimality criterion sharply decreases the number of training examples the learner needs in order to achieve good performance.

...read moreread less

2,122 citations

Proceedings Article•

Learning from Demonstration

[...]

Stefan Schaal¹•Institutions (1)

Georgia Institute of Technology¹

03 Dec 1996

TL;DR: In an implementation of pole balancing on a complex anthropomorphic robot arm, it is demonstrated that, when facing the complexities of real signal processing, model-based reinforcement learning offers the most robustness for LQR problems.

...read moreread less

Abstract: By now it is widely accepted that learning a task from scratch, i.e., without any prior knowledge, is a daunting undertaking. Humans, however, rarely attempt to learn from scratch. They extract initial biases as well as strategies how to approach a learning problem from instructions and/or demonstrations of other humans. For teaming control, this paper investigates how learning from demonstration can be applied in the context of reinforcement learning. We consider priming the Q-function, the value function, the policy, and the model of the task dynamics as possible areas where demonstrations can speed up learning. In general nonlinear learning problems, only model-based reinforcement learning shows significant speed-up after a demonstration, while in the special case of linear quadratic regulator (LQR) problems, all methods profit from the demonstration. In an implementation of pole balancing on a complex anthropomorphic robot arm, we demonstrate that, when facing the complexities of real signal processing, model-based reinforcement learning offers the most robustness for LQR problems. Using the suggested methods, the robot learns pole balancing in just a single trial after a 30 second long demonstration of the human instructor.

...read moreread less

592 citations

Journal Article•DOI•

Incremental multi-step Q-learning

[...]

Jing Peng¹, Ronald J. Williams²•Institutions (2)

University of California, Riverside¹, Northeastern University²

01 Jan 1996-Machine Learning

TL;DR: A novel incremental algorithm that combines Q-learning with the TD(λ) return estimation process, which is typically used in actor-critic learning, leading to faster learning and also helping to alleviate the non-Markovian effect of coarse state-space quatization.

...read moreread less

Abstract: This paper presents a novel incremental algorithm that combines Q-learning, a well-known dynamic-programming based reinforcement learning method, with the TD(λ) return estimation process, which is typically used in actor-critic learning, another well-known dynamic-programming based reinforcement learning method. The parameter λ is used to distribute credit throughout sequences of actions, leading to faster learning and also helping to alleviate the non-Markovian effect of coarse state-space quatization. The resulting algorithm.Q(λ)-learning, thus combines some of the best features of the Q-learning and actor-critic learning paradigms. The behavior of this algorithm has been demonstrated through computer simulations.

...read moreread less

355 citations

Journal Article•DOI•

Structural learning with forgetting

[...]

Masumi Ishikawa¹•Institutions (1)

Kyushu Institute of Technology¹

01 Apr 1996-Neural Networks

TL;DR: Results demonstrate the effectiveness of structural learning with forgetting, applied to various examples: the discovery of Boolean functions, classification of irises, discovery of recurrent networks, prediction of time series and rule extraction from mushroom data.

...read moreread less

319 citations

Posted Content•

Active Learning with Statistical Models

[...]

David Cohn¹, Zoubin Ghahramani¹, Michael I. Jordan¹•Institutions (1)

Massachusetts Institute of Technology¹

01 Mar 1996-arXiv: Artificial Intelligence

TL;DR: This work shows how the same principles may be used to select data for two alternative, statistically-based learning architectures: mixtures of Gaussians and locally weighted regression.

...read moreread less

Abstract: For many types of machine learning algorithms, one can compute the statistically `optimal' way to select training data. In this paper, we review how optimal data selection techniques have been used with feedforward neural networks. We then show how the same principles may be used to select data for two alternative, statistically-based learning architectures: mixtures of Gaussians and locally weighted regression. While the techniques for neural networks are computationally expensive and approximate, the techniques for mixtures of Gaussians and locally weighted regression are both efficient and accurate. Empirically, we observe that the optimality criterion sharply decreases the number of training examples the learner needs in order to achieve good performance.

...read moreread less

274 citations

Proceedings Article•

MBT: A Memory-Based Part of Speech Tagger-Generator

[...]

Walter Daelemans, Jakub Zavrel, Peter Berck, Steven Gillis

01 Jul 1996

TL;DR: A large-scale application of the memory-based approach to part of speech tagging is shown to be feasible, obtaining a tagging accuracy that is on a par with that of known statistical approaches, and with attractive space and time complexity properties when using IGTree, a tree-based formalism for indexing and searching huge case bases.

...read moreread less

Abstract: We introduce a memory-based approach to part of speech tagging. Memory-based learning is a form of supervised learning based on similarity-based reasoning. The part of speech tag of a word in a particular context is extrapolated from the most similar cases held in memory. Supervised learning approaches are useful when a tagged corpus is available as an example of the desired output of the tagger. Based on such a corpus, the tagger-generator automatically builds a tagger which is able to tag new text the same way, diminishing development time for the construction of a tagger considerably. Memory-based tagging shares this advantage with other statistical or machine learning approaches. Additional advantages specific to a memory-based approach include (i) the relatively small tagged corpus size sufficient for training, (ii) incremental learning, (iii) explanation capabilities, (iv) flexible integration of information in case representations, (v) its non-parametric nature, (vi) reasonably good results on unknown words without morphological analysis, and (vii) fast learning and tagging. In this paper we show that a large-scale application of the memory-based approach is feasible: we obtain a tagging accuracy that is on a par with that of known statistical approaches, and with attractive space and time complexity properties when using IGTree, a tree-based formalism for indexing and searching huge case bases. The use of IGTree has as additional advantage that optimal context size for disambiguation is dynamically computed. 1 I n t r o d u c t i o n Part of Speech (POS) tagging is a process in which syntactic categories are assigned to words. It can be seen as a mapping from sentences to strings of tags. Automatic tagging is useful for a number of applications: as a preprocessing stage to parsing, in information retrieval, in text to speech systems, in corpus linguistics, etc. The two factors determining the syntactic category of a word are its lexical probability (e.g. without context, man is more probably a noun than a verb), and its contextual probability (e.g. after a pronoun, man is more probably a verb than a noun, as in they man the boats). Several approaches have been proposed to construct automatic taggers. Most work on statistical methods has used n-gram models or Hidden Markov Model-based taggers (e.g. Church, 1988; DeRose, 1988; Cutting et al. 1992; Merialdo, 1994, etc.). In

...read moreread less

274 citations

Proceedings Article•

Comparative Experiments on Disambiguating Word Senses: An Illustration of the Role of Bias in Machine Learning

[...]

Raymond J. Mooney

01 Jan 1996

TL;DR: An experimental comparison of seven different learning algorithms on the problem of learning to disambiguate the meaning of a word from context finds the statistical and neural-network methods perform the best on this particular problem.

...read moreread less

Abstract: This paper describes an experimental comparison of seven different learning algorithms on the problem of learning to disambiguate the meaning of a word from context. The algorithms tested include statistical, neural-network, decision-tree, rule-based, and case-based classification techniques. The specific problem tested involves disambiguating six senses of the word ``line'' using the words in the current and proceeding sentence as context. The statistical and neural-network methods perform the best on this particular problem and we discuss a potential reason for this observed difference. We also discuss the role of bias in machine learning and its importance in explaining performance differences observed on specific problems.

...read moreread less

254 citations

Dissertation•

Learning and example selection for object and pattern detection

[...]

Kah Kay Sung, Tomaso Poggio

01 Jan 1996

TL;DR: This thesis presents a learning based approach for detecting classes of objects and patterns with variable image appearance but highly predictable image boundaries, and proposes an active learning formulation for function approximation, and shows that the active example selection strategy learns its target with fewer data samples than random sampling.

...read moreread less

Abstract: Object and pattern detection is a classical computer vision problem with many potential applications, ranging from automatic target recognition to image-based industrial inspection tasks in assembly lines. While there have been some successful object and pattern detection systems in the past, most such systems handle only specific rigid objects or patterns that can be accurately described by fixed geometric models or pictorial templates. This thesis presents a learning based approach for detecting classes of objects and patterns with variable image appearance but highly predictable image boundaries. Some examples of such object and pattern classes include human faces, aerial views of structured terrain features like volcanoes, localized material defect signatures in industrial parts, certain tissue anomalies in medical images, and instances of a given digit or character, which may be written or printed in many different styles. The thesis consists of two parts. In part one, we introduce our object and pattern detection approach using a concrete human face detection example. The approach first builds a distribution-based model of the target pattern class in an appropriate feature space to describe the target's variable image appearance. It then learns from examples a similarity measure for matching new patterns against the distribution-based target model. We also discuss some pertinent learning issues, including ideas on virtual example generation and example selection. The approach makes few assumptions about the target pattern class and should therefore be fairly general, as long as the target class has predictable image boundaries. We show that this is indeed the case by demonstrating the technique on two other pattern detection/recognition problems. Because our object and pattern detection approach is very much learning-based, how well a system eventually performs depends heavily on the quality of training examples it receives. The second part of this thesis looks at how one can select high quality examples for function approximation learning tasks. Active learning is an area of research that investigates how a learner can intelligently select future training examples to get better approximation results with less data. We propose an active learning formulation for function approximation, and show for three specific approximation function classes, that the active example selection strategy learns its target with fewer data samples than random sampling. Finally, we simplify the original active learning formulation, and show how it leads to a tractable example selection paradigm, suitable for use in many object and pattern detection problems. (Copies available exclusively from MIT Libraries, Rm. 14-0551, Cambridge, MA 02139-4307. Ph. 617-253-5668; Fax 617-253-1690.)

...read moreread less

254 citations

Journal Article•DOI•

Neural network exploration using optimal experiment design

[...]

David Cohn

01 Aug 1996-Neural Networks

TL;DR: The authors apply techniques from optimal experiment design (OED) to guide the query/action selection of a neural network learner, and demonstrate that these techniques allow the learner to minimize its generalization error by exploring its domain efficiently and completely.

...read moreread less

249 citations

Proceedings Article•

Discovering Structure in Multiple Learning Tasks: The TC Algorithm.

[...]

Sebastian Thrun, Joseph O'Sullivan

01 Jan 1996

TL;DR: The task-clustering algorithm TC clusters learning tasks into classes of mutually related tasks, and outperforms its non-selective counterpart in situations where only a small number of tasks is relevant.

...read moreread less

Abstract: Recently, there has been an increased interest in “lifelong” machine learning methods, that transfer knowledge across multiple learning tasks. Such methods have repeatedly been found to outperform conventional, single-task learning algorithms when the learning tasks are appropriately related. To increase robustness of such approaches, methods are desirable that can reason about the relatedness of individual learning tasks, in order to avoid the danger arising from tasks that are unrelated and thus potentially misleading. This paper describes the task-clustering (TC) algorithm. TC clusters learning tasks into classes of mutually related tasks. When facing a new learning task, TC first determines the most related task cluster, then exploits information selectively from this task cluster only. An empirical study carried out in a mobile robot domain shows that TC outperforms its non-selective counterpart in situations where only a small number of tasks is relevant.

...read moreread less

Proceedings Article•

Discretizing continuous attributes while learning Bayesian networks

[...]

Nir Friedman¹, Moises Goldszmidt•Institutions (1)

Stanford University¹

03 Jul 1996

TL;DR: A method for learning Bayesian networks that handles the discretization of continuous variables as an integral part of the learning process is introduced, using a new metric based on the Minimal Description Length principle for choosing the threshold values for theDiscretization while learning the Bayesian network structure.

...read moreread less

Abstract: We introduce a method for learning Bayesian networks that handles the discretization of continuous variables as an integral part of the learning process. The main ingredient in this method is a new metric based on the Minimal Description Length principle for choosing the threshold values for the discretization while learning the Bayesian network structure. This score balances the complexity of the learned discretization and the learned network structure against how well they model the training data. This ensures that the discretization of each variable introduces just enough intervals to capture its interaction with adjacent variables in the network. We formally derive the new metric, study its main properties, and propose an iterative algorithm for learning a discretization policy. Finally, we illustrate its behavior in applications to supervised learning.

...read moreread less

Proceedings Article•

Combining data mining and machine learning for effective user profiling

[...]

Tom Fawcett, Foster Provost

02 Aug 1996

TL;DR: This paper combines data mining and constructive induction with more standard machine learning techniques to design methods for detecting fraudulent usage of cellular telephones based on profiling customer behavior, and uses a rule-learning program to uncover indicators of fraudulent behavior from a large database of cellular calls.

...read moreread less

Abstract: This paper describes the automatic design of methods for detecting fraudulent behavior. Much of the design is accomplished using a series of machine learning methods. In particular, we combine data mining and constructive induction with more standard machine learning techniques to design methods for detecting fraudulent usage of cellular telephones based on profiling customer behavior. Specifically, we use a rule-learning program to uncover indicators of fraudulent behavior from a large database of cellular calls. These indicators are used to create profilers, which then serve as features to a system that combines evidence from multiple profilers to generate high-confidence alarms. Experiments indicate that this automatic approach performs nearly as well as the best hand-tuned methods for detecting fraud.

...read moreread less

Proceedings Article•

Relational instance-based learning

[...]

Werner Emde, Dietrich Wettschereck

03 Jul 1996

Book•

Connectionist, Statistical and Symbolic Approaches to Learning for Natural Language Processing

[...]

Stefan Wermter, Ellen Riloff, Gabriele Scheler

15 Mar 1996

TL;DR: Embedded machine learning systems for natural language processing: Acquiring and updating hierarchical knowledge for machine translation based on a clustering technique and applying an existing machine learning algorithm to text categorization.

...read moreread less

Abstract: Learning approaches for natural language processing- Separating learning and representation- Natural language grammatical inference: A comparison of recurrent neural networks and machine learning methods- Extracting rules for grammar recognition from Cascade-2 networks- Generating English plural determiners from semantic representations: A neural network learning approach- Knowledge acquisition in concept and document spaces by using self-organizing neural networks- Using hybrid connectionist learning for speech/language analysis- SKOPE: A connectionist/symbolic architecture of spoken Korean processing- Integrating different learning approaches into a multilingual spoken language translation system- Learning language using genetic algorithms- A statistical syntactic disambiguation program and what it learns- Training stochastic grammars on semantical categories- Learning restricted probabilistic link grammars- Learning PP attachment from corpus statistics- A minimum description length approach to grammar inference- Automatic classification of dialog acts with Semantic Classification Trees and Polygrams- Sample selection in natural language learning- Learning information extraction patterns from examples- Implications of an automatic lexical acquisition system- Using learned extraction patterns for text classification- Issues in inductive learning of domain-specific text extraction rules- Applying machine learning to anaphora resolution- Embedded machine learning systems for natural language processing: A general framework- Acquiring and updating hierarchical knowledge for machine translation based on a clustering technique- Applying an existing machine learning algorithm to text categorization- Comparative results on using inductive logic programming for corpus-based parser construction- Learning the past tense of English verbs using inductive logic programming- A dynamic approach to paradigm-driven analogy- Can punctuation help learning?- Using parsed corpora for circumventing parsing- A symbolic and surgical acquisition of terms through variation- A revision learner to acquire verb selection rules from human-made rules and examples- Learning from texts - A terminological metareasoning perspective

...read moreread less

Proceedings Article•

A complexity analysis of space-bounded learning algorithms for the constraint satisfaction problem

[...]

Roberto J. Bayardo¹, Daniel P. Miranker¹•Institutions (1)

University of Texas at Austin¹

04 Aug 1996

TL;DR: This paper analyzes the effects of polynomial-space-bounded learning on runtime complexity of backtrack search and finds that relevance- bounded learning allows better runtime bounds than size-bounds learning on structurally restricted constraint satisfaction problems.

...read moreread less

Abstract: Learning during backtrack search is a space-intensive process that records information (such as additional constraints) in order to avoid redundant work. In this paper, we analyze the effects of polynomial-space-bounded learning on runtime complexity of backtrack search. One space-bounded learning scheme records only those constramts With limited size, and another records arbitrarily large constraints but deletes those that become irrelevant to the portion of the search space being explored. We find that relevance-bounded learning allows better runtime bounds than size-bounded learning on structurally restricted constraint satisfaction problems. Even when restricted to linear space, our relevance-bounded learning algorithm has runtime complexity near that of unrestricted (exponential space-consuming) learning schemes.

...read moreread less

Book Chapter•DOI•

Machine Learning via Polyhedral Concave Minimization

[...]

Olvi L. Mangasarian¹•Institutions (1)

University of Wisconsin-Madison¹

01 Jan 1996

TL;DR: The proposed concave minimization formulation, a successive linearization algorithm without stepsize terminates after a maximum average of 7 linear programs on problems with as many as 4192 points in 14-dimensional space, is quite effective and more efficient than other approaches.

...read moreread less

Abstract: Two fundamental problems of machine learning, misclassification minimization [10, 24, 18] and feature selection, [25, 29, 14] are formulated as the minimization of a concave function on a polyhedral set. Other formulations of these problems utilize linear programs with equilibrium constraints [18, 1, 4, 3] which are generally intractable. In contrast, for the proposed concave minimization formulation, a successive linearization algorithm without stepsize terminates after a maximum average of 7 linear programs on problems with as many as 4192 points in 14-dimensional space. The algorithm terminates at a stationary point or a global solution to the problem. Preliminary numerical results indicate that the proposed approach is quite effective and more efficient than other approaches.

...read moreread less

Journal Article•DOI•

Fuzzy Model Reference Learning Control

[...]

J.R. Layne¹, Kevin M. Passino¹•Institutions (1)

Ohio State University¹

01 Jan 1996-Journal of Intelligent and Fuzzy Systems

TL;DR: This brief article introduces a learning controller developed by synthesizing several basic ideas from fuzzy set and control theory, self-organizing control, and conventional adaptive control that can achieve high performance learning control for a nonlinear time-varying rocket velocity control problem and a multi-input multi-output two-degree-of-freedom robot manipulator.

...read moreread less

Abstract: A learning system possesses the capability to improve its performance over time by interaction with its environment. A learning control system is designed so that its learning controller has the ability to improve the performance of the closed-loop system by generating command inputs to the plant and utilizing feedback information from the plant. In this brief article, we introduce a learning controller that is developed by synthesizing several basic ideas from fuzzy set and control theory, self-organizing control, and conventional adaptive control. We utilize a learning mechanism that observes the plant outputs and adjusts the membership functions of the rules in a direct fuzzy controller so that the overall system behaves like reference model. The effectiveness of this fuzzy model reference learning controller is illustrated by showing that it can achieve high performance learning control for a nonlinear time-varying rocket velocity control problem and a multi-input multi-output two-degree-of-freedom robot manipulator.

...read moreread less

Journal Article•DOI•

Incremental backpropagation learning networks

[...]

LiMin Fu¹, Hui-Huang Hsu¹, Jose C. Principe¹•Institutions (1)

University of Florida¹

01 May 1996-IEEE Transactions on Neural Networks

TL;DR: A new incremental learning method for pattern recognition is presented, called the "incremental backpropagation learning network", which employs bounded weight modification and structural adaptation learning rules and applies initial knowledge to constrain the learning process.

...read moreread less

Abstract: How to learn new knowledge without forgetting old knowledge is a key issue in designing an incremental-learning neural network. In this paper, we present a new incremental learning method for pattern recognition, called the "incremental backpropagation learning network", which employs bounded weight modification and structural adaptation learning rules and applies initial knowledge to constrain the learning process. The viability of this approach is demonstrated for classification problems including the iris and the promoter domains.

...read moreread less

Journal Article•DOI•

Learning first-order definitions of functions

[...]

J. R. Quinlan¹•Institutions (1)

University of Sydney¹

01 Aug 1996-Journal of Artificial Intelligence Research

TL;DR: This paper shows how a particular first-order learning system is modified to customize it for finding definitions of functional relations, which leads to faster learning times and, in some cases, to definitions that have higher predictive accuracy.

...read moreread less

Abstract: First-order learning involves finding a clause-form definition of a relation from examples of the relation and relevant background information. In this paper, a particular first-order learning system is modified to customize it for finding definitions of functional relations. This restriction leads to faster learning times and, in some cases, to definitions that have higher predictive accuracy. Other first-order learning systems might benefit from similar specialization.

...read moreread less

Proceedings Article•DOI•

Behavior coordination for a mobile robot using modular reinforcement learning

[...]

Eiji Uchibe¹, Minoru Asada, Koh Hosoda•Institutions (1)

Osaka University¹

04 Nov 1996

TL;DR: A method of modular learning which coordinates multiple behaviors taking account of a trade-off between learning time and performance is presented, applied to one to one soccer playing robots.

...read moreread less

Abstract: Coordination of multiple behaviors independently obtained by a reinforcement learning method is one of the issues in order for the method to be scaled to larger and more complex robot learning tasks. Direct combination of all the state spaces for individual modules (subtasks) needs enormous learning time, and it causes hidden states. This paper presents a method of modular learning which coordinates multiple behaviors taking account of a trade-off between learning time and performance. First, in order to reduce the learning time the whole state space is classified into two categories based on the action values separately obtained by Q learning: the area where one of the learned behaviors is directly applicable (no more learning area), and the area where learning is necessary due to competition of multiple behaviors (re-learning area). Second, hidden states are detected by model fitting to the learned action values based on the information criterion. Finally, the initial action valves in the re-learning area are adjusted so that they can be consistent with the values in the no more learning area. The method is applied to one to one soccer playing robots. Computer simulation and real robot experiments are given, to show the validity of the proposed method.

...read moreread less

Journal Article•DOI•

Incremental Learning from Positive Data

[...]

Steffen Lange, Thomas Zeugmann¹•Institutions (1)

Kyushu University¹

01 Aug 1996-Journal of Computer and System Sciences

TL;DR: It is proved that incremental learning can be always simulated by inference devices that are both set-driven and conservative and feed-back learning is shown to be more powerful than iterative inference, and its learning power is incomparable to that of bounded example memory inference.

...read moreread less

Book Chapter•DOI•

Dynamical Selection of Learning Algorithms

[...]

Christopher J. Merz¹•Institutions (1)

University of California, Irvine¹

01 Jan 1996

TL;DR: A new approach to predicting a given example’s class by locating it in the “example space” and then choosing the best learner in that region of the example space to make predictions, which is compared to other methods for selecting from multiple learning algorithms.

...read moreread less

Abstract: Determining the conditions for which a given learning algorithm is appropriate is an open problem in machine learning. Methods for selecting a learning algorithm for a given domain have met with limited success. This paper proposes a new approach to predicting a given example’s class by locating it in the “example space” and then choosing the best learner(s) in that region of the example space to make predictions. The regions of the example space are defined by the prediction patterns of the learners being used. The learner(s) chosen for prediction are selected according to their past performance in that region. This dynamic approach to learning algorithm selection is compared to other methods for selecting from multiple learning algorithms. The approach is then extended to weight rather than select the algorithms according to their past performance in a given region. Both approaches are further evaluated on a set of ten domains and compared to several other meta-learning strategies.

...read moreread less

Proceedings Article•

Adaptive On-line Learning in Changing Environments

[...]

Noboru Murata, Klaus-Robert Müller, Andreas Ziehe, Shun-ichi Amari

03 Dec 1996

TL;DR: An adaptive on-line algorithm extending the learning of learning idea can be applied to learning continuous functions or distributions, even when no explicit loss function is given and the Hessian is not available.

...read moreread less

Abstract: An adaptive on-line algorithm extending the learning of learning idea is proposed and theoretically motivated. Relying only on gradient flow information it can be applied to learning continuous functions or distributions, even when no explicit loss function is given and the Hessian is not available. Its efficiency is demonstrated for a non-stationary blind separation task of acoustic signals.

...read moreread less

Journal Article•DOI•

Dynamically focused fuzzy learning control

[...]

W.A. Kwong¹, Kevin M. Passino¹•Institutions (1)

Ohio State University¹

01 Feb 1996

TL;DR: This paper shows how the subsequent "dynamically focused learning" (DFL) can be used to enhance the performance of the "fuzzy model reference learning controller" (FMRLC) and furthermore it performs comparative analysis with a conventional adaptive control technique.

...read moreread less

Abstract: A "learning system" possesses the capability to improve its performance over time by interacting with its environment. A learning control system is designed so that its "learning controller" has the ability to improve the performance of the closed-loop system by generating command inputs to the plant and utilizing feedback information from the plant. Learning controllers are often designed to mimic the manner in which a human in the control loop would learn how to control a system while it operates. Some characteristics of this human learning process may include: (i) a natural tendency for the human to focus their learning by paying particular attention to the current operating conditions of the system since these may be most relevant to determining how to enhance performance; (ii) after learning how to control the plant for some operating condition, if the operating conditions change, then the best way to control the system may have to be re-learned; and (iii) a human with a significant amount of experience at controlling the system in one operating region should not forget this experience if the operating condition changes. To mimic these types of human learning behavior, we introduce three strategies that can be used to dynamically focus a learning controller onto the current operating region of the system. We show how the subsequent "dynamically focused learning" (DFL) can be used to enhance the performance of the "fuzzy model reference learning controller" (FMRLC) and furthermore we perform comparative analysis with a conventional adaptive control technique. A magnetic ball suspension system is used throughout the paper to perform the comparative analyses, and to illustrate the concept of dynamically focused fuzzy learning control.

...read moreread less

Journal Article•

The identification of context-sensitive features: A formal definition of context for concept learning

[...]

Peter D. Turney¹•Institutions (1)

National Research Council¹

01 Jan 1996-arXiv: Learning

TL;DR: This paper formally distinguish three types of features: primary, contextual, and irrelevant features, and formally define what it means for one feature to be context-sensitive to another feature.

...read moreread less

Abstract: A large body of research in machine learning is concerned with supervised learning from examples. The examples are typically represented as vectors in a multi- dimensional feature space (also known as attribute-value descriptions). A teacher partitions a set of training examples into a finite number of classes. The task of the learning algorithm is to induce a concept from the training examples. In this paper, we formally distinguish three types of features: primary, contextual, and irrelevant features. We also formally define what it means for one feature to be context-sensitive to another feature. Context-sensitive features complicate the task of the learner and potentially impair the learner's performance. Our formal definitions make it possible for a learner to automatically identify context-sensitive features. After context-sensitive features have been identified, there are several strategies that the learner can employ for managing the features; however, a discussion of these strategies is outside of the scope of this paper. The formal definitions presented here correct a flaw in previously proposed definitions. We discuss the relationship between our work and a formal definition of relevance.

...read moreread less

Journal Article•DOI•

Fast learning method for back-propagation neural network by evolutionary adaptation of learning rates

[...]

Heung Bum Kim¹, Sung Hoon Jung¹, Tag Gon Kim¹, Kyu Ho Park¹•Institutions (1)

KAIST¹

01 May 1996-Neurocomputing

TL;DR: Simulation results show that the learning speed achieved by the method is superior to that of other adaptive selection methods.

...read moreread less

Patent•

Method, computer program product, and system for teaching or reinforcing information without requiring user initiation of a learning sequence

[...]

Steven M. Sorensen, Kim L. Sorensen

26 Aug 1996

TL;DR: In this article, the authors present a method, computer program product, and system for teaching reinforcing concepts, principals, and other learned information without requiring user initiation of a learning sequence.

...read moreread less

Abstract: In a given environment where a user may work, play, or otherwise interact, such as the environment provided by a system comprising computer hardware and software, the present invention provides a method, computer program product, and system for teaching reinforcing concepts, principals, and other learned information without requiring user initiation of a learning sequence. Learning or reinforcement occurs by presenting "learning frames" in the environment automatically without requiring user initiation of the learning sequence. The user of the environment receives these intrusive or non-intrusive opportunities for learning while doing other tasks within the environment and may be interrupted from the task at hand and be required to respond to the presented learning frame or may simply have the opportunity for learning without requiring interruption of the task at hand according to the implementation of the present invention. In this manner, learning occurs as a by-product of other useful work, play, or other interaction with the environment and does not require dedicated user time and overt effort.

...read moreread less

On-line Algorithms in Machine Learning.

[...]

Avrim Blum¹•Institutions (1)

Carnegie Mellon University¹

01 Jan 1996

TL;DR: A survey of results, models, and open problems from Computational Learning Theory that seem particularly interesting from the point of view of on-line algorithms research can be found in this article.

...read moreread less

Abstract: : The areas of On-Line Algorithms and Machine Learning are both concerned with problems of making decisions about the present based only on knowledge of the past. Although these areas differ in terms of their emphasis and the problems typically studied, there are a collection of results in Computational Learning Theory that fit nicely into the 'on-line algorithms' framework. This survey article discusses some of the results, models, and open problems from Computational Learning Theory that seem particularly interesting from the point of view of on-line algorithms research. The emphasis in this article is on describing some of the simpler, more intuitive results, whose proofs can be given in their entirety. Pointers to the literature are given for more sophisticated versions of these algorithms.

...read moreread less

Journal Article•DOI•

The multiscale classifier

[...]

Brian C. Lovell¹, Andrew P. Bradley¹•Institutions (1)

Cooperative Research Centre¹

01 Feb 1996-IEEE Transactions on Pattern Analysis and Machine Intelligence

TL;DR: In this paper, a rule-based inductive learning algorithm called multiscale classification (MSC) is proposed to classify the training data by successively splitting the feature space in half.

...read moreread less

Abstract: Proposes a rule-based inductive learning algorithm called multiscale classification (MSC). It can be applied to any N-dimensional real or binary classification problem to classify the training data by successively splitting the feature space in half. The algorithm has several significant differences from existing rule-based approaches: learning is incremental, the tree is non-binary, and backtracking of decisions is possible to some extent. The paper first provides background on current machine learning techniques and outlines some of their strengths and weaknesses. It then describes the MSC algorithm and compares it to other inductive learning algorithms with particular reference to ID3, C4.5, and back-propagation neural networks. Its performance on a number of standard benchmark problems is then discussed and related to standard learning issues such as generalization, representational power, and over-specialization.

...read moreread less